Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Seasonal Notation Similarity Scores

By Daniel Marks

February 26, 2022

Introduction

· This is my second attempt at "Bill James Fusion", which, much like fusion cooking, is when I take 2 different Bill James concepts and combine them into something a little different (and hopefully delicious, although your taste experience may vary).

My initial fusion attempt was combining the Bill James creations often referred to as "Hall of Fame metrics" (Hall of Fame Monitor, Black Ink, Gray Ink, and Hall of Fame Standards) with Similarity Scores. This time, I’m combining traditional Similarity Scores with what Bill used to refer to as "Seasonal Notation", which was simply a player’s stats expressed in a per-162 game context. OK, maybe the concept of expressing statistics on a per-162 game basis isn’t originally Bill’s, but I do believe he came up with the term "Seasonal Notation", so that’s good enough for me.

· This is not a new concept. I do remember seeing Similarity Scores per 162 games as a feature on the now-defunct "Baseball Gauge" (or "Seamheads") web site, although I’m using a different set of categories and penalties in coming up with the scores. Also, I believe they only included it as part of their Negro Leagues Database section on the site, although I’m not positive I’m remembering that 100% correctly, and it’s too late for me to verify that.

· This was my initial attempt at this approach, so I think there’s a lot of room for potential improvement. I suspect many of you would have picked different categories or come up with different penalties, and that’s certainly understandable. This is just my attempt at coming up with a scoring approach and seeing what kind of results it generated.

· Although traditional Similarity Scores are often referenced in the context of Hall of Fame discussions and comparisons, I’m not really pushing for the same thing here. In my opinion, Hall of Fame candidates have a lot of potential areas to consider – their total careers, their peaks, the impact of individual seasons, contributions to successful teams, awards and honors, milestones, records, and so on.

Seasonal Notation Similarity Scores, by themselves, give some sense of the level and quality of a player’s performance, but not how prolific they were. And, many of my examples are for players who had short careers, and those short careers often "benefit" from being expressed in a per-162 game context because the player in question did not experience an extended decline phase.

For example, Pete Rose’s Seasonal Notation and rate stats (batting average, OBP, etc.) suffer, in part, because he kept on playing and playing and playing. Had he stopped playing a few years earlier, his rate stats and his Seasonal Notation numbers would have looked a lot better, but then he also wouldn’t have enjoyed the "bulk" totals he currently possesses. It’s a double-edged sword. And, the Hall of Fame, I believe, tends to favor those with longer careers, or at least those with more impressive career totals.

Background

I’ve always loved the concept of Similarity Scores, because the topic is close to my heart. In my everyday job, I am involved in demand planning and forecasting, and the concept of similarity is when we try to plan and forecast the items and product categories that we sell. Is a new product similar to another one that already exists? How similar? In what way might it be different? Is it in a similar product category but with some key product feature differences? Are the items being compared different brands? Are they priced differently? Are they promoted differently? What kind of "sales curve" do they follow? Do they tend to have stable sales, or do they fluctuate wildly by time of year? What are the implications of the similarities, and what are the implications of the differences?

Now, I will say that I suspect traditional Similarity Scores may not be leaned upon as heavily as they may once were. Bill introduced them roughly 40 years ago, and people use them for all kinds of comparisons, including (but not limited to) Hall of Fame discussions. I think they were a big step forward in how we compare players (and, boy, do we like to compare players!). But there are a lot of caveats in using them, as everyone (including Bill) acknowledges. Similarity Scores use basic career stat categories, ones that are not adjusted for time or place, so that a home run is a home run regardless of when or where it was hit, and a .300 average is treated the same regardless of whether it was generated in the 1930’s or the 1960’s. Also, the categories for hitters are strictly offense-based, although there is a positional adjustment.

To level set, below is the explanation of Similarity Scores from baseball-reference.com (there’s one for batters and one for pitchers, this is just the one for hitters):

Similarity scores are not our concept. Bill James introduced them in the mid-1980s, and we lifted his methodology from his book The Politics of Glory (p. 86-106). To compare one player to another, start at 1000 points and then subtract points based on the statistical differences of each player.

Batters

One point for each difference of 20 games played.

One point for each difference of 75 at bats.

One point for each difference of 10 runs scored.

One point for each difference of 15 hits.

One point for each difference of 5 doubles.

One point for each difference of 4 triples.

One point for each difference of 2 home runs.

One point for each difference of 10 RBI.

One point for each difference of 25 walks.

One point for each difference of 150 strikeouts.

One point for each difference of 20 stolen bases.

One point for each difference of .001 in batting average.

One point for each difference of .002 in slugging percentage.

The key here is that traditional Similarity Scores use a player’s career statistics. What I want to take a look at is comparing players on a "per opportunity" basis, which in my case is per 162 games. I could have used plate appearances, but I decided to put everything into a context of 162 games. And I think the most interesting results are for players who had abbreviated careers.

A couple of quick examples, with a quick sidebar:

For some reason, Al Rosen is a fascinating player to me, I suppose because he packed a lot into a very short career. A few bullet points:

· Rosen only played 10 seasons, and the first 3 of those were no more than brief cups of coffee as he was stuck behind Cleveland’s All Star third baseman, Ken Keltner, so he really only had 7 seasons, and really only 5 good ones.

· When Rosen finally did get an opportunity, he made the most of it. In 1950, he broke the AL record for home runs by a rookie with 37, a mark that stood until Mark McGwire hit 49 in 1987.

· In 1953, Rosen had what may be the best season any third baseman has ever had when he hit 43 home runs, drove in 145 runs, scored 115 runs, slugged .613, had an OPS+ of 180, and had 367 total bases, all of which were league-leading figures. He also hit .336, just missing the batting title (and a triple crown) by a single point to Mickey Vernon’s .337. In addition, he realized a rWAR of 10.1, which is still the only time a third baseman has achieved a WAR of 10.0 or higher. Rosen was named the unanimous MVP.

· In 1954, Rosen had one of the greatest individual performances in All Star history when he went 3 for 4 with 2 home runs, 5 RBI, and a walk. The 2 home runs and the 5 RBI are tied for the single-game highs in All Star game history.

Rosen also had what I think most people would agree was a generally successful post-playing career as an executive for the Yankees, Astros, and Giants.

Anyway, here’s the stat line for Al Rosen and his top 5 comps by traditional Similarity Scores:

Name	Score	G	AB	R	H	2B	3B	HR	RBI	SB	BB	SO	BA	SLG
Al Rosen	1,000	1,044	3,725	603	1,063	165	20	192	717	39	587	385	.285	.495
Bob Horner	934	1,020	3,777	560	1,047	169	8	218	685	14	369	512	.277	.499
Anthony Rendon	912	1,026	3,830	624	1,100	269	16	151	611	45	476	682	.287	.484
Josh Hamilton	910	1,027	3,909	609	1,134	234	24	200	701	50	352	938	.290	.516
Jim Ray Hart	906	1,125	3,783	518	1,052	148	29	170	578	17	380	573	.278	.467
Charlie Keller	904	1,170	3,790	725	1,085	166	72	189	760	45	784	499	.286	.518

Rosen had a short career (he spent a few years at the start of his career behind Ken Keltner, and then he retired early due to back issues), and so naturally the players considered most similar to him were players with similar career lengths who lined up close to his career stats.

But these aren’t the type of players who Rosen reminds me of. Well, Rendon feels like a decent comp, but he’s also an active player whose stats are in flux. I think Rosen was quite a bit better overall than Horner and Hart. Hamilton and Keller were good players (and I think Keller was probably a better overall hitter than Rosen), but they were outfielders. So, this feels a little unsatisfying to me in terms of capturing what kind of player Rosen was.

Another example:

Here’s the stat line for Dodger Hall of Famer Roy Campanella and his top 5 comps by traditional Similarity Scores:

Name	Score	G	PA	AB	R	H	2B	3B	HR	RBI	SB	BB	SO	BA	SLG
Roy Campanella	1,000	1,430	5,648	4,951	771	1,401	226	30	260	1,017	34	605	501	.283	.498
Javy Lopez	913	1,503	5,793	5,319	674	1,527	267	19	260	864	8	357	969	.287	.491
Brian McCann	866	1,755	6,850	6,067	742	1,590	294	5	282	1,018	25	640	1,054	.262	.452
Walker Cooper	862	1,473	5,082	4,702	573	1,341	240	40	173	812	18	309	357	.285	.464
Troy Tulowitzki	857	1,291	5,415	4,804	762	1,391	264	24	225	780	57	511	900	.290	.495
Jason Varitek	829	1,546	5,839	5,099	664	1,307	306	14	193	757	25	614	1,216	.256	.435

So, these players do have career hitting stats that bear some similarity to Campanella (although Lopez is the only one with a score over 900), but part of that is that Campanella had 2 major influences on his career stats – his early years and stats are severely understated due to his time spent in the Negro Leagues (8 years but with only 214 games that have been captured), and then his paralyzing injury before the 1958 season that eliminated whatever time he may have had remaining. As a result, Campanella only had the equivalent of about 9 full seasons worth of games.

So, while these are fine players, they are not the players that Campanella reminds me of.

Approach

In coming up with the scheme for Seasonal Notation Similarity Scores, I decided to keep some of the categories from traditional Similarity Scores, but to eliminate others.

I did away with total games played since everything is being expressed as a per-162 game context, so it’s totally unnecessary. I also did away with at bats, as I felt like it wasn’t real valuable in the per-162 game context.

I also eliminated hits, doubles, triples, and strikeouts, as I didn’t consider them essential. I could have kept strikeouts, but it’s been in such flux over time that I felt like I’d have to adjust or index everyone’s figures, and I was trying to keep this version pretty simple, so I just decided to eliminate them at this point.

So, from the original Similarity Score methodology, I’m keeping 7 of the original categories:

· Home Runs

· Runs

· RBI

· Walks

· Stolen Bases

· Batting Average

· Slugging Percentage

The first 5 are then adjusted to "per 162 games", with batting average and slugging percentage staying as is.

What am I adding?

I thought OBP should be included (I was kind of surprised that it wasn’t already, I had always assumed it was), so I added it. That brings us to 8.

I also thought some more current metrics might be useful, ones that adjust for context, so I added

· WAR (baseball reference version, per 162 games)

· dWAR (per 162 games)

· OPS+

That gives me 11 categories rather than the original 13. 10 would have been a more satisfying number, but I decided not to let that bother me.

Now, I know that dWAR (defensive WAR) and WAR overlap some (WAR essentially is total player value covering hitting, baserunning, and defense, and both dWAR and WAR incorporate a positional adjustment). Also, I’m sure not everyone is sold on dWAR as a measure, but ultimately I decided to keep both of them. WAR is a good approximation of overall value, and dWAR is at least something we can use to try to quantify defensive value, so I felt like they both brought something to the table, but I didn’t go all the way to bring in oWAR (offensive WAR) as a separate metric.

One of the other reasons I’m including dWAR is that I decided to make this Similarity Score totally about comparing players at the same primary position. That is, there’s no position adjustment I’m making in the score calculation as is done in traditional Similarity Scores – you’re either the same position or you’re not. I did come up with a "switch" in my spreadsheet that allows Similarity Scores to be generated ignoring the position mandate, but in that option I get rid of dWAR. Mostly, I’m going to focus on players at the same position, because I think that is a large part of what I think of as "similarity". I’m sure not everyone would agree with this, and I’m aware that, in most cases, a player’s "primary" position is not the only one they played at, so dWAR may not be a perfect metric to leverage, but it’s the approach I took.

The next step was establishing penalties for differences in each category. Without going into too much detail, I played around with the penalties until I reached what I got results that I was comfortable with based on the range of values in each category, the scale that each category uses, and the rollup of penalty points applied.

The table below summarizes where I landed, keeping in mind that most of the penalty point figures are a lot different than traditional Similarity Scores because we’re dealing with per-162 game figures, so the data we’re comparing is on a much smaller scale (with smaller differences) than career totals, and the relative size of the penalties for each unit difference had to reflect a different magnitude than in traditional Similarity Scores:

Category	Penalty for Difference
Home Runs per 162 games	2 points for each HR per 162 games
Runs per 162 games	1 point for each run per 162 games
RBI per 162 games	1 point for each RBI per 162 games
Stolen Bases per 162 games	3 points for each stolen base per 162 games
Walks per 162 games	1 point for each walk per 162 games
Batting Average	1 point for each .001 difference
OBP	.75 points for each .001 difference
Slugging Pct.	.5 points for each .001 difference
WAR per 162 games	10 points for each 1.0 WAR per 162 games
dWar per 162 games	4 points for each 0.1 dWAR difference per 162 games
OPS+	1 point for each point difference

Again, there’s nothing magical about these penalty points – I just played around with them until I got what I thought were reasonable results. I’m sure they could be improved upon.

Examples

OK. Hopefully that’s enough of a setup. Let’s put it through its paces.

I find that most of the "interesting" examples tend to be players who had abbreviated careers of one kind or another, as those are the ones who tend to benefit by looking at stats expressed in Seasonal Notation. Of course, often it’s true that those players get the benefit of not having what I would call an "elongated" decline phase which can affect a player’s rate stats. I fully acknowledge that effect.

A few notes:

· In each table, I’m going to put each category stat included in the calculation.

· The lists will show the top 10 comps in descending order of the score (the player being compared to is listed first, then the #1 comp, then the #2 comp, and so on).

· "SN" is shorthand for "Seasonal Notation".

· Unless otherwise noted, I’m only including comparison players who have at least 1,000 career games played and are classified as playing the same "primary" position.

· I’m also going to include career games as an information column just to put each player’s total career length in perspective, although obviously total career games are not part of the comparison. But, on many of these examples, I’m using players who had relatively short careers, so this is just a reminder of that and to keep the comparisons in perspective.

· Finally, if someone is among the player’s current top 10 traditional Similarity Score comps, I’ll put that rank in parentheses by the player’s name, so we get a sense of which players can be considered as similar regardless of whether we’re looking at their careers or their seasonal notation.

Let’s start by circling back to Roy Campanella:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Roy Campanella		1,000	29	87	115	4	69	4.7	1.00	126	.283	.363	.498	1,430
Yogi Berra	Y	932	27	90	109	2	54	4.6	0.70	125	.285	.348	.482	2,120
Johnny Bench	Y	906	29	82	103	5	67	5.6	1.48	126	.267	.342	.476	2,158
Bill Dickey	Y	902	18	84	109	3	61	5.1	0.92	127	.313	.382	.486	1,789
Gabby Hartnett (9)	Y	898	19	71	96	2	57	4.6	1.08	126	.297	.370	.489	1,990
Buster Posey		872	19	78	86	3	64	5.3	1.16	129	.302	.372	.460	1,371
Jorge Posada (6)		865	24	80	94	2	83	3.8	0.23	121	.273	.374	.474	1,829
Carlton Fisk	Y	862	24	83	86	8	55	4.4	1.10	117	.269	.341	.457	2,499
Mike Piazza	Y	858	36	89	113	1	64	5.0	0.13	143	.308	.377	.545	1,912
Javy Lopez (1)		850	28	73	93	1	38	3.2	0.61	112	.287	.337	.491	1,503
Brian McCann (2)		831	26	68	94	2	59	3.0	0.73	110	.262	.337	.452	1,755

So, we can see that 4 players who were on Campanella’s top 10 traditional comp list also made his seasonal notation top 10. Campanella’s top 2 traditional comps (Lopez and McCann) are still on the list, but they’re much further down, while Hartnett is a little higher up, and Posada’s about the same.

The big difference now is that Campanella’s top 4 comps are all Hall of Famers, and his #1 comp is his contemporary and fellow 3-time 1950’s MVP, Yogi Berra, which I have to say I’m pretty happy with. And, as you can see, they’re pretty comparable among most categories, with Campanella having a little better OBP (and higher walks) and Slugging Percentage, and a little higher dWAR, with Yogi (of course) having the much career games figure. But on a per-162 game performance basis, they’re pretty close.

I don’t know about you, but this is a very satisfying list to me. Because of many of the reasons outlined earlier, Campanella often suffers when compared to other great catchers. Campanella is 17^th in JAWS, for example. Now, that’s not a complaint…..everyone recognizes why he ranks so low on something like that, and we make proper adjustments. His total career games that have been officially captured is only about 1,400 games, and that’s a relatively low total. But, again, we know why that is.

When I think of Campanella, I think of him as a top-10 all-time catcher, possibly even top 5 depending on your perspective of what’s important and how to make the necessary adjustments. But I like the fact that his best comps are players like Berra, Bench, Dickey, and Hartnett rather than Lopez, McCann, Cooper and Tulowitzki.

OK. How about revisiting Al Rosen?

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Al Rosen		1,000	30	94	111	6	91	5.0	0.06	137	.285	.384	.495	1,044
Chipper Jones	Y	902	30	105	105	10	98	5.5	-0.06	141	.303	.401	.529	2,499
Eddie Mathews	Y	897	35	102	98	5	98	6.5	0.38	143	.271	.376	.509	2,391
Josh Donaldson		895	34	100	98	5	88	6.0	0.70	135	.269	.367	.505	1,201
David Wright		895	25	97	99	20	78	5.0	0.03	133	.296	.376	.491	1,585
Anthony Rendon (2)		890	24	99	96	7	75	5.1	0.79	126	.287	.369	.484	1,026
Troy Glaus		888	34	94	100	6	90	4.0	0.32	119	.254	.358	.489	1,537
Hank Thompson (9)		877	22	92	85	8	83	4.6	0.33	122	.274	.376	.460	1,069
George Brett	Y	875	19	95	96	12	66	5.3	0.13	135	.305	.369	.487	2,707
Ron Santo	Y	866	25	82	96	3	80	5.1	0.63	125	.277	.362	.464	2,243
Bob Elliott		861	14	87	98	5	79	4.2	-0.25	124	.289	.375	.440	1,978

Rosen’s previous #1 comp (Bob Horner) falls out of the top 10 (he’s down at 18 now). 2 others (Rendon and Thompson) are still in the top 10. Rosen’s #1 comp is now Chipper Jones.

Now, I will say this. Chipper is the #1 comp, but Chipper’s still better, and Chipper has edges in nearly every category above. Chipper is better on a seasonal basis, and he’s light years ahead on career, as his career is two-and-a half times as long. But, Rosen checks in with per-162 game figures of 30 HR, 94 RBI, 91 walks, a 137 OPS+, and a WAR per 162 of 5.0. That’s a pretty darn good ballplayer.

Again, as often happens with Similarity Scores, you can have players who are the most similar to you but who are better players than you. And, of course, the players right after Chipper Jones are players like Donaldson, Wright, Rendon, and Glaus. Donaldson (like Rosen) won an MVP (as did Elliott) and had other high finishes, and both Wright and Rendon have placed high as well. It’s an interesting blend of some of the very elite at the position (Brett, Mathews, Jones, Santo) and others who were pretty good, but for a much shorter time.

But I feel like this list gives a better representation of the quality of player Rosen was when he was actually playing than his traditional comp list. I’m not trying to put him in the Hall of Fame - he had a very short career. But he was a very good player while he was in there.

Who else can we look at? How about Jackie Robinson?

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Jackie Robinson		1,000	16	111	87	23	86	7.3	1.18	133	.313	.410	.477	1,416
Charlie Gehringer	Y	881	13	124	100	13	83	5.9	0.75	125	.320	.404	.480	2,323
George Grantham (1)		809	12	102	80	15	80	3.7	-0.24	122	.302	.392	.461	1,444
Dustin Pedroia		808	15	99	78	15	67	5.6	1.66	113	.299	.365	.439	1,512
Frankie Frisch	Y	808	7	107	87	29	51	5.0	1.51	110	.316	.369	.432	2,311
Tony Lazzeri	Y	806	17	92	111	14	81	4.4	0.48	121	.292	.380	.467	1,740
Eddie Collins	Y	801	3	104	74	42	86	7.1	0.46	142	.333	.424	.429	2,826
Rod Carew	Y	796	6	93	67	23	67	5.3	-0.11	131	.328	.393	.429	2,469
Nap Lajoie	Y	795	5	98	104	25	34	7.0	0.66	150	.338	.380	.466	2,480
Ryne Sandberg	Y	794	21	99	79	26	57	5.1	1.01	114	.285	.344	.452	2,164
Roberto Alomar	Y	792	14	103	77	32	70	4.6	0.22	116	.300	.371	.443	2,379

Robinson’s top comps in traditional Similarity Scores are George Grantham, Daniel Murphy, Freddie Lindstrom, Edgardo Alfonso, and Denny Lyons. Lindstrom is the only Hall of Famer among Robinson’s top 10 traditional comp list (although Jose Altuve is currently sitting at #8). Grantham still makes the list, but it’s now virtually all Hall of Famers.

Now, much like Campanella, we understand the limits of traditional Similarity Scores when it comes to Robinson. Robinson had a very short career in MLB (10 seasons), and he didn’t debut with the Dodgers until age 28, so his career numbers understate his true value. He’s 10^th in JAWS, which is impressive enough as is even on face value, but he’s even better than that. Robinson is 16^th in career WAR among second basemen, but 6^th in WAR7 (top 7 seasons). Robinson was a great player, arguably top 5 at the position. The brevity of his career is a big part of why his top 10 traditional comps aren’t very impressive.

Anyway, Seasonal Notation Similarity Scores illustrate how great Jackie was, and it yields a much more impressive list of comps. His WAR/162 of 7.3 is higher than every 2^nd baseman with 1,000 or more career games except for Rogers Hornsby (9.1). And, even though his updated top 10 comp list is 80% Hall of Famers, the only one with a relatively high Similarity Score figure is Charlie Gehringer (881). They’re reasonably similar, except Robinson has more WAR/162, more stolen bases per 162, and more quantitative defensive value. Gehringer’s a great player, one of my all-time favorites, but I think Robinson was the better player, all of which plays into the greatness and uniqueness of Robinson’s career performance.

How about another player who had an abbreviated career? Let’s look at Don Mattingly:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Don Mattingly		1,000	20	91	100	1	53	3.8	-0.56	127	.307	.358	.471	1,785
Ripper Collins		965	20	92	98	3	53	3.7	-0.49	126	.296	.360	.492	1,084
Adrian Gonzalez (6)		922	27	84	101	1	66	3.7	-0.30	129	.287	.358	.485	1,929
Eddie Murray	Y	918	27	87	103	6	71	3.7	-0.62	129	.287	.359	.476	3,026
Ted Kluszewski		911	26	80	97	2	46	3.0	-0.93	123	.298	.353	.498	1,718
Mike Sweeney		910	24	85	101	6	58	2.8	-0.86	118	.297	.366	.486	1,454
Joe Torre	Y	908	18	73	87	2	57	4.2	-0.02	129	.297	.365	.452	2,209
Tony Perez	Y	906	22	74	96	3	54	3.2	-0.39	122	.279	.341	.463	2,777
Justin Morneau		905	26	81	103	1	60	2.8	-0.69	120	.281	.348	.481	1,545
Cecil Cooper (1)		905	21	86	96	8	38	3.1	-0.84	121	.298	.337	.466	1,896
Kent Hrbek		900	27	84	101	3	78	3.6	-0.71	128	.282	.367	.481	1,747

One reason I’m including Mattingly is that his #1 comp has one of the highest Seasonal Notation Similarity Scores that I’ve come across, and that’s Ripper Collins at a whopping 965. Collins was a member of the famous St. Louis Cardinals’ "Gas House Gang" of the 1930’s, but I think he often gets overshadowed by the more memorable characters from that team, such as Dizzy and Daffy Dean, Frankie Frisch, Pepper Martin, Joe Medwick and Leo Durocher. In the Gang’s most famous season (1934), Collins was probably the team’s most valuable position player (tied for league lead with 35 HR’s, led the league in total bases and slugging), and probably the 2^nd best overall after Dean (who famously won 30 games). But, Collins ultimately had a pretty short career with only 7 seasons of 100 or more games.

Anyway, Collins is a very strong across-the-board match for Mattingly, with no major differences in the individual categories. All 10 of Mattingly’s top comps have scores of 900 or above.

2 players (Gonzalez and Cooper) carry over from Mattingly’s traditional Similarity Score comp list. 3 of the comps are Hall of Famers, but Torre is in more for his managerial success (although he was a fine player as well), while Murray and Perez both had much longer careers.

Charlie Keller came up early in the article, a great hitter with an abbreviated career. Let’s run him through the tool:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Charlie Keller	-	1,000	26	100	105	6	109	6.1	-0.12	152	.286	.410	.518	1,170
Lance Berkman	-	902	32	99	106	7	104	4.5	-0.95	144	.293	.406	.537	1,879
Bob Johnson	-	893	25	108	112	8	93	4.9	-0.50	139	.296	.393	.506	1,863
Ralph Kiner	Y	860	41	107	112	2	111	5.3	-1.18	149	.279	.398	.548	1,472
Jeff Heath	-	851	23	91	104	7	69	4.4	-0.63	139	.293	.370	.509	1,383
Carl Yastrzemski	Y	847	22	89	90	8	90	4.7	0.05	130	.285	.379	.462	3,308
Monte Irvin	Y	843	22	89	108	8	73	5.0	0.24	134	.304	.388	.489	1,032
Ken Williams	-	825	23	100	106	18	66	5.0	-0.42	138	.319	.393	.530	1,397
Matt Holliday	-	823	27	98	104	9	68	3.8	-1.12	132	.299	.379	.510	1,903
Christian Yelich	-	808	24	103	85	20	83	5.0	-0.52	132	.292	.379	.477	1,095
Kevin Mitchell (2)	-	808	31	83	101	4	65	3.9	-1.07	142	.284	.360	.520	1,223

Keller’s top 3 traditional comps were Josh Hamilton, Kevin Mitchell, and Al Rosen. Mitchell is the only top 10 comp from Keller’s traditional list that survives, and he’s down at #9. I like the top 4 very much here – Keller, Berkman, Johnson, and Kiner all seem like the same mold – good combination of batting average, OBP, and pop, and generating around 100 runs/RBI/walks per 162 games played, really valuable offensive weapons. Kiner separated himself from the others because he gained notoriety from the 7 consecutive seasons he led the league in home runs, but they all seem to be cut from the same cloth.

How about Eric Davis, who for a season or two was about an exciting a player as I can remember watching?

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Eric Davis		1,000	28	93	93	35	74	3.6	-0.91	125	.269	.359	.482	1,626
Ray Lankford (10)		892	23	92	83	25	79	3.6	0.05	123	.272	.364	.477	1,701
Dale Murphy		881	30	89	94	12	73	3.5	-0.51	121	.265	.346	.469	2,180
Andrew McCutchen		876	25	97	86	18	85	4.2	-0.70	131	.280	.373	.476	1,761
Carlos Beltran		861	27	99	99	20	68	4.4	0.13	119	.279	.350	.486	2,586
Chili Davis		859	23	82	91	9	79	2.5	-0.94	121	.274	.360	.451	2,435
Ellis Burks		858	29	101	98	15	64	4.0	-0.54	126	.291	.363	.510	2,000
Grady Sizemore		846	22	97	76	21	71	4.1	0.06	115	.265	.349	.457	1,101
Fred Lynn		846	25	87	91	6	71	4.1	-0.26	129	.283	.360	.484	1,969
Bobby Murcer		843	21	83	89	11	73	2.7	-1.34	124	.277	.357	.445	1,908
Amos Otis		841	16	89	82	28	61	3.5	-0.31	115	.277	.343	.425	1,998

Outside of Lankford, it’s a completely different set of players from Davis’ traditional Similarity Score list. Davis’ top comps on the traditional scale are Kirk Gibson, Jeromy Burnitz, Darryl Strawberry, and Raul Mondesi, all of whom are pretty good comps, but none of whom were primarily center fielders.

Interesting to note that none of the comps are currently Hall of Famers. It’s a lot of players who exhibit 20-20 type of potential, but none of them has been able to reach Cooperstown, although Dale Murphy and Carlos Beltran are certainly in the discussion.

Davis is just a little bit shy of being the only player in history who would have a per-162 stat line with both 30 HR and 30 stolen bases (Fernando Tatis Jr. and Ronald Acuna Jr. both currently have that status, but of course they are quite early in their respective careers). Bobby Bonds is the closest at 29 HR and 40 stolen bases per 162.

Speaking of Darryl Strawberry:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Darryl Strawberry		1,000	34	92	102	23	84	4.3	-0.74	138	.259	.357	.505	1,583
Reggie Jackson	Y	934	32	89	98	13	79	4.2	-0.94	139	.262	.356	.490	2,820
Jose Canseco		893	40	102	121	17	78	3.6	-1.18	132	.266	.353	.515	1,887
Rocky Colavito		888	33	85	102	2	84	3.9	-0.40	132	.266	.359	.489	1,841
Kirk Gibson		873	25	98	86	28	71	3.8	-0.63	123	.268	.352	.463	1,635
Bob Allison		871	27	85	84	9	84	3.6	-0.56	127	.255	.358	.471	1,541
Jose Bautista (9)		865	31	92	88	6	93	3.3	-0.71	124	.247	.361	.475	1,798
David Justice		863	31	93	102	5	91	4.1	-0.26	129	.279	.378	.500	1,610
Jackie Jensen		860	22	91	105	16	84	3.1	-0.44	120	.279	.369	.460	1,438
Jack Clark		859	28	91	96	6	103	4.3	-1.04	137	.267	.379	.476	1,994
Roger Maris		853	30	91	94	2	72	4.2	-0.18	127	.260	.345	.476	1,463

Reggie Jackson is by far the best comp by this method, with pretty close figures across the board except that Strawberry had about twice the rate of stolen bases, although Jackson through age 30 was stealing at a rate of about 20 per 162 games, but then that fell way off as he aged. In any case, Strawberry and Reggie were very comparable through their 20’s, and they do show a very strong similarity on a per 162 basis, with Reggie ultimately playing nearly twice as many games.

Colavito’s another interesting comp with tight similarity to Strawberry across the board except for stolen bases, which represents the majority of the penalty points in the calculation of the score.

How about Mo Vaughn? Vaughn had a productive but relatively brief career.

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Mo Vaughn		1,000	35	92	114	3	78	2.9	-1.33	132	.293	.383	.523	1,512
Fred McGriff		931	32	89	102	5	86	3.5	-1.14	134	.284	.377	.509	2,460
Miguel Cabrera		929	31	94	113	2	75	4.3	-1.22	145	.310	.387	.532	2,587
David Ortiz	Y	928	36	95	119	1	89	3.7	-1.41	141	.286	.380	.552	2,408
Carlos Delgado		927	38	99	120	1	88	3.5	-1.37	138	.280	.383	.546	2,035
Rafael Palmeiro		917	33	95	105	6	77	4.1	-0.61	132	.288	.371	.515	2,831
Hal Trosky (9)		914	27	100	122	3	66	3.6	-0.96	130	.302	.371	.522	1,347
Travis Hafner		912	29	85	100	2	82	3.4	-1.34	134	.273	.376	.498	1,183
Prince Fielder (1)		910	32	87	103	2	85	2.4	-2.06	134	.283	.382	.506	1,611
Jason Giambi		903	32	88	103	1	98	3.6	-1.41	139	.277	.399	.516	2,260
José Abreu		901	33	89	115	2	47	4.0	-1.09	135	.290	.350	.515	1,113

Lots of players with over 900 Similarity Scores. By traditional Similarity Scores, Vaughn’s top comps are Prince Fielder (who is still on the list above but drops to #8), Paul Goldschmidt (still active), Ted Kluszewski, Freddie Freeman (still active), and David Justice. McGriff is a pretty good fit for Vaughn, but of course McGriff’s career was about 60% longer. Cabrera also generates a pretty high score, but I do think he’s distinctly better than what Vaughn produced, and, of course, had a much longer career as well. Vaughn had a nice stat line, and it was nice career while it lasted.

Before we looked at Campanella and Robinson, both of whom had their career stats affected by the Color Line and time spent in the Negro Leagues. How about looking at players who were primarily or even exclusively Negro Leaguers?

Now, I’m sure some might argue that we can’t take the stats literally, and there’s certainly discussion to be had there. I know there are a lot of efforts going on in the realm of trying to translate the stats for Negro League players into Major League Equivalencies (MLE), and I was tempted to leverage the work that had been done in that area, but I think I’ll save that exercise for another time after I get more of a chance to digest them. Besides, if we start making adjustments for Negro League stats, don’t we kind of have to consider doing that for everyone? Didn’t players who played in the National League and American League during the existence of the Negro Leagues benefit stat-wise from not facing all of the best available talent? Stats always reflect an intersection of performance, context, talent, circumstances, rules, ballparks, competition, and any number of other factors. They are never pure results. But, if you start adjusting some, I think you have to start adjusting for everyone. I mean, does anyone think that Ty Cobb would produce a .366 career average in another era? Neither do I.

So, for now…..what if we simply take the Negro League stats on face value, but with the understanding that there are likely many things that influence them? What do they imply, and what kind of shape and comparisons do they invite? How about if we do that for now, and I will also take it as a "to-do" to follow up with MLE’s and see what kind of results those yield?

Here's the great Buck Leonard:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Buck Leonard		1,000	26	147	152	9	109	7.9	-0.39	181	.345	.450	.589	587
Lou Gehrig	Y	912	37	141	149	8	113	8.5	-0.67	179	.340	.447	.632	2,164
Jimmie Foxx	Y	839	37	122	134	6	102	6.5	-0.41	163	.325	.428	.609	2,317
Hank Greenberg	Y	819	38	122	148	7	99	6.4	-0.51	159	.313	.412	.605	1,394
Dan Brouthers	Y	790	10	148	126	25	81	7.7	-0.16	171	.342	.423	.520	1,676
Jeff Bagwell	Y	734	34	114	115	15	106	6.0	-0.54	149	.297	.408	.540	2,150
Johnny Mize	Y	712	31	96	115	2	74	6.1	-0.56	158	.312	.397	.562	1,884
Joey Votto		710	28	95	91	7	110	5.5	-0.47	148	.302	.416	.520	1,900
Todd Helton		706	27	101	101	3	96	4.5	-0.36	133	.316	.414	.539	2,247
Frank Thomas	Y	691	36	104	119	2	116	5.1	-1.57	156	.301	.419	.555	2,322
Paul Goldschmidt		668	31	104	102	15	92	5.6	-0.36	142	.293	.389	.521	1,469

*This is based on comps with 1,000 or more games. If I lower it to 500 or more, another Negro League great (Mule Suttles) pops in at #3.

Now, obviously Leonard’s traditional Similarity Score comps aren’t great examples because Leonard’s published career stats only capture 587 games, so he tends to be compared on the basis of career totals of players who have that same type of context. As a result, it mostly captures other great Negro League stars like Cristobal Torriente, Bullet Rogan, Ben Taylor, Edgar Wesley, and Heavy Johnson, who were great players with similar number of career "official" games, but the primary AL/NL players it comes up with are Dale Alexander and Lefty O’Doul, who had short careers.

As you can see, Leonard’s Seasonal Notation Similarity Score comps yield a pretty star-studded list, and the comp with the highest score, by far, is none other than Lou Gehrig. Poetic justice, in my book, and in Leonard’s "book" as well. Leonard’s biography, which he wrote with James A. Riley, was titled "Buck Leonard: The Black Lou Gehrig: The Hall of Famer's Story in His Own Words", and comparisons to Gehrig were common while Leonard was active. The great Monte Irvin once commented that, had Leonard been allowed to play in MLB, then they might have referred to Gehrig as the white Buck Leonard instead of the other way around.

Now, again, I’m sure some might be reluctant to directly compare Negro League stats to NL or AL stats, and I understand that. You can draw your own conclusions, However, by any standard, Leonard was probably the greatest first baseman in Negro League history, and Gehrig was probably the greatest in either the NL or the AL. The bold type (league category leadership) on Leonard’s baseball reference page definitely reaches out and grabs you almost as much as Gehrig’s does. I think Leonard is probably more comparable to Gehrig (and vice versa) than anyone else in history, and that includes Gehrig’s traditional #1 comp, Jimmie Foxx.

I think Leonard and Gehrig are both elite, and they deserve to be compared to each other. I suspect the consensus of most experts is that Gehrig had a little more home run power and was maybe a little better overall. Buck O’Neill commented that he though Gehrig had more home run power but thought that Leonard was better defensively. In his "Baseball 100" list, Joe Posnanski has Gehrig at #14, and Leonard at #53. Isolating it to just first basemen (and if we consider Stan Musial as an outfielder rather than a first baseman, which is what I would normally do), Posnanski has Gehrig #1 (#14 overall), Albert Pujols #2 (#23 overall), Jimmie Foxx #3 (#33 overall), and Leonard #4 (#53 overall). In the New Bill James Historical Abstract (which is now about 20 years old), Bill also had Gehrig at #14 overall with Leonard at #65, but he had more first basemen in between the two. His top first basemen were, in order, Gehrig, Foxx, Mark McGwire, Mule Suttles (who could also be considered an outfielder) Jeff Bagwell, Eddie Murray, Johnny Mize, Harmon Killebrew, and then Leonard. So, he essentially had Leonard around #8 or #9 among first basemen.

By the way, I tried using the Major League Equivalent figures (MLE’s) that I alluded to earlier on Leonard, and if use those, his #1 Seasonal Notation comp would be Will Clark. Clark’s a great player, but I think Leonard was probably better than Clark.

How about Cool Papa Bell?

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Cool Papa Bell		1,000	8	155	80	39	72	4.3	-0.51	126	.325	.395	.446	1,199
Jimmy Ryan		889	9	132	88	34	65	3.5	-0.81	124	.308	.375	.444	2,014
George Van Haltren		884	6	134	83	47	71	3.2	-0.92	122	.316	.386	.418	1,990
Hugh Duffy	Y	862	10	145	121	54	62	4.0	-0.23	123	.326	.386	.451	1,737
Mike Griffin		859	4	151	77	51	87	4.4	-0.09	123	.296	.388	.407	1,513
Earle Combs	Y	853	6	132	70	11	75	5.0	-0.30	125	.325	.397	.462	1,455
Pete Browning		851	6	131	90	35	64	5.6	-0.79	163	.341	.403	.467	1,183
George Gore		848	6	164	76	21	89	4.9	-0.49	136	.301	.386	.411	1,310
Mike Donlin		844	8	103	84	33	48	4.5	-0.86	144	.333	.386	.468	1,049
Ben Chapman		826	8	108	92	27	78	4.0	0.05	114	.302	.383	.440	1,717
Edd Roush	Y	822	6	91	81	22	40	3.8	-0.50	126	.323	.369	.446	1,967

I’m not real satisfied with that list, because it’s heavily dominated by pre-1900 players like Ryan, Van Haltren, Duffy, Griffin, Browning, and Gore, and a lot of that is driven by those players’ stolen base figures that got a boost from how stolen bases were defined for several years in that era (which included things like taking an extra base on a single, for example). Players from that era also have the advantage of being in a high-scoring environment, which enables them to be better "comps" to Bell’s rather striking per-162 game figure of 155 runs scored.

So, this is a good case where I think some intervention would be a good idea. How about if we limit it to players whose careers were from 1901 or later?

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Cool Papa Bell		1,000	8	155	80	39	72	4.3	-0.51	126	.325	.395	.446	1,199
Earle Combs	Y	853	6	132	70	11	75	5.0	-0.30	125	.325	.397	.462	1,455
Ben Chapman		826	8	108	92	27	78	4.0	0.05	114	.302	.383	.440	1,717
Edd Roush	Y	822	6	91	81	22	40	3.8	-0.50	126	.323	.369	.446	1,967
Cesar Cedeno		791	16	88	79	44	54	4.3	-0.35	123	.285	.347	.443	2,006
Brett Butler		772	4	99	42	41	83	3.6	-0.45	110	.290	.377	.376	2,213
Johnny Damon		767	15	109	74	27	65	3.7	-0.13	104	.284	.352	.433	2,490
Kenny Lofton		756	10	118	60	48	73	5.3	1.19	107	.299	.372	.423	2,103
Dom DiMaggio		754	10	121	72	12	87	3.9	0.35	111	.298	.383	.419	1,399
Lenny Dykstra		753	10	102	51	36	81	5.4	0.90	120	.285	.375	.419	1,278
Amos Otis		748	16	89	82	28	61	3.5	-0.31	115	.277	.343	.425	1,998

That may feel a little better, but it also results in generally lower scores. Bell’s rate of 155 runs per 162 games is obviously a tough comparison point that results in a pretty hefty penalty for most of these players who can’t approach that level.

Combs is a pretty good match at face value in most categories except for stolen bases, although Combs reportedly had excellent speed himself. Of course, Bell’s speed was, by most accounts more in the "elite" realm as opposed to just merely "excellent". Combs reportedly was a pretty proficient base stealer at AA Louisville, but Combs didn’t really translate his speed into stolen bases at the MLB level, which, if you’re hitting at the top of the order ahead of players like Ruth and Gehrig, would make sense. Miller Huggins essentially instructed him to get on base and let the big guys hit him in, and I can’t say as I blame him.

By the way, if I don’t control for position (which also means eliminating dWAR and recalculating penalties and scores) and I only take players from 1901 or later, the top 3 comps for Bell are Ross Youngs, Kiki Cuyler, and Paul Molitor.

One more note…..like Leonard, I went ahead and ran Bell using the MLE’s on him, and his #1 comp was Edd Roush (908), who played roughly 10 years prior to Bell and who is also the #4 comp on Bell’s list displayed above. So….what do you think of Cool Papa Bell and Edd Roush as comparable? Maybe….

Anyway, in Bell’s case, I’m not sure which list yields the better "similarity". Is it the group dominated by the pre-1900 players? Or the more modern one? I’m not sure.

I will say that some of the truly elite Negro League players (thinking in particular of Josh Gibson and Oscar Charleston) present a challenge because of their very unique and high level nature of their stats, but that can also be true of traditional career-based Similarity Scores. For example, the "most similar" traditional Similarity Score comp for Pete Rose is Paul Molitor at a measly 678, and the closest to Cy Young is Walter Johnson at 703. Sometimes, there are no great comps.

Gibson’s top Seasonal Notation comp is Mike Piazza, but the Similarity Score is a ridiculously low 418, but then again, Gibson’s top traditional Similarity Score comp is another Negro League star catcher (Biz Mackey), and that score is under 700. Gibson just simply doesn’t have anyone who compares very closely unless his stats undergo a severe adjustment. His stats per 162 are simply off the charts.

Charleston’s top Seasonal Notation comp based on 1,000 or more games is Ty Cobb, but it’s also a pretty low score (732). Charleston’s top 2 comps if we lower the threshold to 500 or more games are 2 great Negro League center fielders, Turkey Stearnes (Score of 850) and Cristóbal Torriente (747).

So, I think there are certainly some things we can glean from putting Negro League stars through this mechanism to see how players compare at face value of the stats on a per 162 basis, but there might have to be some additional work done on applying some adjustments to make the comparisons even more meaningful.

A few other quick ones for a few of the big-name Negro League stars, focusing on the #1 comps only, and to be honest, I’m only going to give their "face value" #1 comps, because I think the #1 "
MLE" comps don’t do them justice.

The # 1 comp for Biz Mackey is Bill Dickey (890)

The # 1 comp for Mule Suttles is Hank Greenberg (864)

The # 1 comp for Judy Johnson is Pie Traynor (910)

The # 1 comp for Ray Dandridge is also Pie Traynor (960)

The # 1 comp for Cristóbal Torriente is Tris Speaker (881)

The # 1 comp for John Henry Lloyd is Arky Vaughan (802) (note: I made Lloyd’s primary position shortstop even though his official data classifies him with more games at second base).

The # 1 comp for Turkey Stearnes is Joe DiMaggio (795)

The # 1 comp for Martín Dihigo is Alex Rodriguez (917)
(Dihigo is listed in the database as a SS. If I ignore position and dWAR, other top comps include Ken Williams, Larry Walker, and Earl Averill).

The # 1 comp for Monte Irvin is Carl Yastrzemski (909)
(note that Irvin’s stats combine both his time in the Negro Leagues as well as his years in the National League)

How about an active player? Again, the big caveat here is that an active player’s rate stats and per-162 game figures won’t necessarily hold up once his complete career is in the books, but it’s still fun to compare.

Let’s try Freddie Freeman:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Freddie Freeman		1,000	28	100	97	5	80	4.5	-0.83	138	.295	.384	.509	1,565
Will Clark		966	23	97	99	5	77	4.6	-0.83	137	.303	.384	.497	1,976
Rafael Palmeiro		937	33	95	105	6	77	4.1	-0.61	132	.288	.371	.515	2,831
Fred McGriff		925	32	89	102	5	86	3.5	-1.14	134	.284	.377	.509	2,460
Dolph Camilli		918	26	102	103	7	103	4.7	-0.54	136	.277	.388	.492	1,490
Hal Trosky (10)		908	27	100	122	3	66	3.6	-0.96	130	.302	.371	.522	1,347
Norm Cash		908	29	81	86	3	81	4.0	-0.71	139	.271	.374	.488	2,089
Kent Hrbek		906	27	84	101	3	78	3.6	-0.71	128	.282	.367	.481	1,747
Miguel Cabrera		904	31	94	113	2	75	4.3	-1.22	145	.310	.387	.532	2,587
Eddie Murray	Y	901	27	87	103	6	71	3.7	-0.62	129	.287	.359	.476	3,026
Mo Vaughn (8)		900	35	92	114	3	78	2.9	-1.33	132	.293	.383	.523	1,512

Earlier, I mentioned that Don Mattingly/Ripper Collins had an extremely high Similarity Score of 965, but Freeman and Will Clark nose them out by 1 point, although the similarity will probably start to deteriorate some over the balance of Freeman’s career. Freeman’s got a few more home runs, but every other category is really close.

By the way, does Freeman seem on track to the Hall of Fame? I’m not sure. He’s making good progress, picking up markers here and there with an MVP and a World Series ring, but his MVP was in an abbreviated season, and I kind of get the sense that he may not be realizing enough "sizzle", for lack of a better word. He’s tracking pretty well to the stat line of Eddie Murray through age 31, although Murray’s WAR through the same age was about 10 higher than Freeman’s (Murray didn’t have an MVP, but he did finish 2^nd twice). Murray was ahead, but not by all that much, and then Murray tacked on about 200 more home runs and about 1,400 more hits from that point forward to finish over 500 homers and 3,000 hits. If Freeman replicates that kind of bulk from age 32 on, he’ll be in good shape, but of course that remains to be seen. I think he’s making good progress, but he’s not at "lock" status yet.

How about one more active player? Here’s Jose Altuve:

Name	HOF?	Score	HR-SN	R-SN	RBI-SN	SB-SN	BB-SN	WAR-SN	dWAR-SN	OPS+	BA	OBP	Slug	Games
Jose Altuve		1,000	18	100	72	29	50	4.7	0.06	125	.308	.360	.462	1,437
Roberto Alomar	Y	912	14	103	77	32	70	4.6	0.22	116	.300	.371	.443	2,379
Larry Doyle		888	7	88	73	27	57	4.1	-0.20	125	.290	.357	.408	1,766
Craig Biggio	Y	877	17	105	67	24	66	3.7	-0.16	112	.281	.363	.433	2,850
Ryne Sandberg	Y	875	21	99	79	26	57	5.1	1.01	114	.285	.344	.452	2,164
Rod Carew	Y	847	6	93	67	23	67	5.3	-0.11	131	.328	.393	.429	2,469
George Grantham		847	12	102	80	15	80	3.7	-0.24	122	.302	.392	.461	1,444
Frankie Frisch	Y	847	7	107	87	29	51	5.0	1.51	110	.316	.369	.432	2,311
Ray Durham		844	16	102	72	22	67	2.8	-0.42	104	.277	.352	.436	1,975
Hardy Richardson		843	9	137	101	25	46	5.0	0.36	131	.299	.344	.437	1,334
Julio Franco		842	11	82	77	18	59	2.8	-0.18	111	.298	.365	.417	2,527

Altuve is only 31 and had a nice bounce-back season in 2021 after a rough 2020 that was not just pandemic-shortened but also was his first season dealing with the effects of the sign-stealing controversy. But, all things considered, Altuve lines up pretty well with Alomar at this point, and it’s my opinion that he’s still well on a track to the Hall of Fame despite the controversy, especially if he continues to perform well.

Wrapping it Up

Well, I could go on forever with examples, but I’m sure you get the idea. I haven’t tried coming up with something similar for pitchers yet, but may go that route if it seems fruitful.

If you have any players you’d like to see run through this methodology, please submit them in the comments and I’d be happy to share the results. I should be able to produce one for any position player you suggest.

Thank you for reading,

Dan

COMMENTS (16 Comments, most recent shown first)

OwenH
Fantastic article, Dan. I always enjoy your pieces and posts here, and this is one of your best.
4:05 PM Feb 28th

Manushfan
Hrmmm John Stone! I remember him. And Greenwell yeah that actually makes sense too.
11:51 AM Feb 28th

mpiafsky
This is pretty great. I'm also a big fan of similarity scores, but they're not quite useful. Your lists are clearly superior-- at least for upper echelon players and presumably journeymen as well. Thank you.
10:14 AM Feb 28th

3for3
Ha, mine only double posted!

As far as an 'accelerator', I think anything I'd come up with would make the numbers much different from traditional sim scores. Perhaps something like 1 point for the first difference, then 1.03, 1.06, 1.09 etc. You could then adjust the values so that they start out smaller. Probably something a spreadsheet could handle.
10:09 AM Feb 28th

DMBBHF
Sorry for the multiple posts, guys....My system was acting goofy and posted my prior reply 3 times.
9:09 AM Feb 28th

malbuff
Excellent study. This one will be a lot of fun to play with.
9:08 AM Feb 28th

DMBBHF
Thanks for all the comments, guys...

Terry,

One of the things I built into the model was the option to change the variables easily. When I remove the position requirement (which also removes dWAR) and I remove any penalties for stolen bases (but keep the other category penalties), Jackie's top 10 looks like this:

Arky Vaughan
Mickey Cochrane
Charlie Gehringer
Christian Yelich
Minnie Minoso
Paul Waner
Wade Boggs
Rusty Greer
Bobby Abreu
John Olerud

Kind of weird mix, but that's what it came up with. You're right, a pure statistical comparison is a bit of a challenge for Jackie.

By the way, when I plug in Rickey Henderson, by far the closest comp is Tim Raines with an 808 score (Raines is #9 by traditional Similarity Scores).

Manushfan,

Here are the top 10 for Heinie:

John Stone 915 (Stone and Manush were teammates for a couple of years)
Riggs Stephenson 912
Zack Wheat 903
Chick Hafey 897
Bibb Falk 896
Mike Greenwell 882
Joe Medwick 880
Bobby Veach 875
Bob Fothergill 874
Irish Meusel 872

Wheat and Medwick are also on Manush's traditional top 10 list.

3for3,

Yes, that's a good idea regarding how to penalize successive differences and handling big gaps. I'd be interested to hear if you some specifics on how to try to capture that, and maybe I can incorporate it?

Thanks all!
Dan

9:07 AM Feb 28th

ventboys
That's a lot to unpack, but just looking at one guy ... what would you get if you took Jackie Robinson and removed stolen bases? I suspect his closest historical comps should be players who, in their respective eras, probably stolen way more bases.

I think the cloest player to Jackie as a player -- skillsetwise -- was probably Rickey Henderson. But I don't know if that's ever going to sift out of a purely statistical comparison.

Good stuff!
12:48 AM Feb 28th

tigerlily
Thanks Dan. I think this is another legitimate way to put player's careers into context.
9:51 PM Feb 27th

FrankD
Great study. I believe you have a better similarity measurement in that yours passes the 'eye test'. This is important in that although a math/model results in some outputs, one of the first tests of these outputs is 'does this really make sense'.
5:12 PM Feb 27th

Manushfan
I like this! Be interesting to see how Mr Manush fares in this.
12:33 PM Feb 27th

3for3
Love this. I was really hoping to see Eric Davis here. Thanks for adding him. Some of his comparables are less than satisfying, but he is a fairly unique player, so getting 10 matches means you have to dig down deeper.

One thing I always though the similarity score missed was when there was a big gap in just one stat, that didn't matter as long as the rest look good.

An example for Eric Davis, is his cousin, Chili. Eric stole 35 bases, while Chili only stole 9. Fred Lynn as well.

If I was ever to try a project like this, I'd have each successive difference count more. I realize that might make the method too complicated, but I think it would get players who are more similar in all aspects of the game on the list.
11:04 AM Feb 27th

LoradoTaftWright
Interesting idea, and I think the NS* comps are overall closer than the traditional ones.

I think Frankie Frisch is the most Robinson-like of the new comps, far more so than The Mechanical Man (who was a great player though). Not in a really quantifiable manner though.

* New System
8:35 PM Feb 26th

Seasonal Notation Similarity Scores

COMMENTS (16 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: