Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

In Defense of Bobby Richardson's Defense

By Steven Goldleaf

May 21, 2018

As I threatened to do if no one came up with a head-to-head comparison of Win Shares to WAR, I ran one such comparison myself, and came up with one genuine head-scratcher, which I’ll describe in much detail below.

For convenience’s sake, I chose to compare Win Shares to WAR on the first team I remember rooting for actively, the 1961 Yankees. At age eight, I actually rooted for the Reds, due to the Yankee-hating and Pinson-loving older cousin who introduced me to baseball, but I saw nothing wrong at that age in also following the ’61 Yankees. The next year or the year after, I think, I played my first baseball board game, which was called "Beat The Yankees," or something like that, in which a virtual All-Star team of the rest of the AL (or maybe all of MLB?) struggled to defeat the Yankees. In retrospect, I think the absurd conceit behind this game became an eventual source of my contempt for the Yankees and their fans, that in their universe a fair fight would pit the best players on every other team against their Yankees, and the Yankees would still win most of those contests. They did in the board-game universe version, anyway—now I peg the Yankees-vs.-the-rest–of-MLB at about a .250 winning percentage, maybe. Anyway, the team is sufficiently legendary that most of you should be familiar with it.

The chart comparing their top 25 contributors (as Win Shares rank them) should be self-explanatory, but I’ll explain it briefly anyway: first column is how the players rank according to Win Shares, second column is the players’ names, third is Win Shares themselves, fourth is WAR, fifth is how these 25 players rank according to WAR (blanks denote that the ranking is unchanged from the Win Shares ranking) and the final column is the ratio of Win Shares to WAR for the first dozen players.

WS RANK		WS	WAR	WAR rank	Ratio WS/WAR
1	Mickey Mantle	49.7	10.5		4.7
2	Roger Maris	35.5	6.9		5.1
3	Elston Howard	29.5	5.3		5.6
4	Whitey Ford	21.9	4.2		5.2
5	Luis Arroyo	21.8	3.5	7	6.2
6	Tony Kubek	21.4	3.4	8	6.3
7	Bill Stafford	16.8	3.8	5	4.4
8	Bill Skowron	16.5	2.4	10	6.9
9	Johnny Blanchard	16.4	2.6		6.3
10	Yogi Berra	15.4	2.1	12	7.3
11	Clete Boyer	14.5	3.8	6	3.8
12	Ralph Terry	14.3	2.3	11	6.2
13	Bobby Richardson	11.7	-0.7	34
14	Jim Coates	10.2	0.3	16
15	Rollie Sheldon	9.9	0.7	14
16	Bud Daley	5.9	0	22
17	Bob Cerv	4.5	1.2	13
18	Hal Reniff	4.4	0.6	15
19	Hector Lopez	2.9	-0.3	32
20	Billy Gardner	1.2	-0.1	24
21	Art Ditmar	1.0	-0.8	35
22	Joe DeMaestri	1.0	-0.1	26
23	Tex Clevenger	0.5	-0.3	31
24	Jesse Gonder	0.5	0.1	19
25	Deron Johnson	0.3	-0.3	30

I’ve used (copied, actually) both rankings from thebaseballgauge.com, the Win Shares being their version (which differs slightly beyond just rounding differences from Bill’s 2002 Win Shares book) and the WAR being what they call bWAR, the "Baseball-Reference Wins Above Replacement" version.

Both rankings agree on the order, for the first few players at least: each one ranks the top four 1961 Yankees identically. Then the two charts begin to diverge, but they don’t get downright in-your-face disputatious until we reach WS ranking #13, Bobby Richardson, whom Win Shares seems to view as a solid if undistinguished contributor but whom WAR diagnoses as a blind, plague-carrying leper in the final stages of tertiary syphilis, the 34^th best Yankee on the 1961 team, a guy who cost the Yankees close to one victory, compared to some minor league zhlub whom they could have acquired for the cost of a phone call, give or take a nickel. Since Richardson’s WS and WAR disagree by so much, that’s where I’ve focused my attention. Why does he rank 13^th by WS but 34^th by WAR?

My assumption is that WAR has must have some rational metric that finds Richardson lacking, which WS does not weigh nearly as heavily, and that their negative overall WAR for him means that he played below replacement level. I don’t think they’re claiming that Bud Daley, for example, with a WAR of 0, contributed nothing at all to the Yankees, just that his contributions are roughly what their top AAA pitcher could have contributed. Richardson’s -0.7 WAR rating means to me that they would have done better to send Richardson to the minors and bring up the best middle-infielder they had down there. Translated into words, I take "-0.7" to mean that Richardson cost the Yankees nearly a full win in 1961, compared to whoever would have replaced him on the roster if they’d decided to send him to Richmond for the summer. That move, presumably, wouldna helped much, but it wouldna hurt.

This assumption is probably wrong, because it’s not only insulting to Richardson, it’s pretty insulting to the Yankees’ front office, the thought that they promoted a half-dozen or so guys who couldn’t play as well as the guys they chose not to promote. But what else can a negative Wins Above Replacement figure mean, if not that this player is below the level of an easily available replacement?

I don’t mean to be insulting to anyone, but this stuff puzzles me the closer I look at it. I don’t mean to insult the 1961 Yankees’ bench (I forget where, but Bill has someplace in his writings described the ’61 Yankees bench as a "wretched hive of scum and villainy"—or maybe that was some other old guy with a beard describing the happy-hour crowd at Mos Eisley. In any case, Bill has published a good deal of persuasive disparagement of the quality and depth of the 1961 Yankees’ bench, aside from their backup catching). And I don’t want to question the Yankees’ front office judgment in keeping Richardson on the team, nor the good men who’ve devoted hours and years of their lives to fine-tuning WAR. Maybe I’m clueless, but if you call something "Wins Above Replacement," a negative number means to me a player whose contributions were below replacement, and that seems harsh regarding Richardson.

I was able to compare the ratios of the Yankees’ top dozen players by WS and WAR (I gave up when the prospect of figuring out the proportion of WS to negative WAR presented itself), and they seem to linger somewhere around 5 or 6 Win Shares for every integer of WAR, though the disparity between Yogi Berra’s ratio (7.3) and Clete Boyer’s (3.8) seems disturbingly large, nearly double. Perhaps playing time accounts for the disparities, but if that’s so, shouldn’t two full-time Yankees like Mickey Mantle (646 Plate Appearances) and Bill Skowron (608 PA) have ratios closer than they are (4.7/6.9) ? Both WS and WAR are attempts to quantify players’ contributions, so I was a bit surprised to see them diverging as widely as they did at times.

When I saw Bobby Richardson literally place off the WAR chart, I thought "Ah ha! Win Shares thinks more highly of defensive players who don’t hit much!" but the very next thing I noticed was that Clete Boyer, also a light-hitting defensive specialist, ranked much higher on the WAR chart and lower on the WS chart, the opposite of Richardson, which kind of blew a crater in that theory.

So that’s where I began comparing the two methodologies: what did Clete Boyer do in 1961 that made WAR rank him so far (28 places) above Richardson, and what did Richardson do in 1961 that made Win Shares rank him so close (2 places) to Boyer? The answer to this one is clear: WAR says that Boyer fielded third base with exceptional skill (2.9 dWAR), while Richardson at second base did his best to negate Boyer’s contributions (-0.9 dWAR).

Another anomaly was Richardson’s stats compared to those of his opposite number, leftfielder Hector Lopez, who ranked thirteen places apart on Win Shares (#19) and WAR (#32). Unlike Richardson, Lopez was an inept fielder at a low-priority defensive position, yet he also did much better in Win Shares than he did in WAR. Lopez had a brutal year with the bat, which was the only reason to put him on the roster in the first place, so I’d also like to know how WAR rates him so close to Richardson: I mean, Lopez couldn’t field at all (he was known as "What a Pair of Hands!" in the same way that Hank Aaron was known as "Bad Henry") and in 1961 he batted even worse (.596 OPS) than he fielded. Both Boyer and Richardson had higher OPSes, which is saying something, so I don’t understand how WAR makes him slightly better overall than it makes Richardson in 1961 (which isn’t saying much). It’s hard to see why Lopez rates spot #32 on the WAR chart while Richardson rates spot #34: Lopez didn’t play every day while Richardson literally did play every day (274 PA as opposed to Richardson’s 704) so where exactly did Lopez outplay him? Not in the batter’s box, either quantitatively or qualitatively, and it’s impossible to grok how a part-time leftfielder with bad hands could conceivably have a defensive edge over a middle infielder with five lifetime Gold Gloves. But that’s exactly what WAR seems to assert: it credits Lopez with a positive 0.3 dWAR and debits Richardson with that nasty negative -0.9 dWAR.

(Lopez’ raw stats, Incidentally, juxtapose neatly with those of supersub Johnny Blanchard: each had the same exact opportunities in 1961 –93 games, 243 ABs. Blanchard had a legendary career year while Lopez had one of the most unproductive years of his career. Learn something every day. But let’s leave Hector out of this discussion for now, and try to make sense of WAR’s disparaging evaluation of Richardson’s defense in 1961.)

The mysteries here derive from the different evaluations of these players’ defense: their offensive play seems comparable, Boyer in 1961 getting 6.9 WAR, Richardson 5.3, which seems about fair.. I could see how Boyer gets judged a fielding superstar—I actually did see him play for much of his career, and he was a flashy, spectacular fielder. But I’ve always thought Richardson played pretty good defense, too. WAR doesn’t see it that way, and Win Shares does. Win Shares makes Richardson a pretty smooth glove-man in 1961, crediting him with 6.4 defensive WS, more than half his total WS for the year. So where one method sees Richardson as being a defensive liability, the other sees him as a defensive star.

It is not necessarily to be expected that WS and WAR should agree on their evaluations of every player—I was actually impressed that the top twelve Yankees on the Win Shares rankings placed as close as they did to the WAR rankings—but when they diverge sharply, as with Richardson and Boyer and Lopez in 1961, that’s probably a good place to start examining how each methodology works. I’m sure I’m misunderstanding some vital aspects of each one, but it does seem curious that they would reach such different evaluations of a few players—the one player on the 1961 Yankees WAR and WS disagree about the most is Richardson, and the place they disagree the most is his defense. According to WAR, Richardson was literally nothing defensively—for his entire career, WAR has him at -0.0, which I take to mean he was ever-so-slightly below the replacement level for an AL second baseman in the 1960s, which is certainly a minority assessment, since Richardson was a regular at second base for eight seasons, 1959-1966, and he won those five Gold Gloves. I’m no fan of Gold Gloves as an accurate metric of fielding excellence, and less of a fan of Bobby Richardson, but even I have to view WAR’s assessment skeptically. My chief complaint about Gold Gloves is that they are often awarded to the best hitting player at a given position, rather than the best fielding player, on the somewhat sound reasoning that players who can’t hit much are going to be playing partial games and often benched to fit a bat into the lineup, so they might not play enough in the field to outshine their offensively more gifted brethren. But the Yankees manager, as Bill has pointed out several times, mistook Richardson’s modest ability to hit singles for true offensive prowess, hence the 700+ plate appearances in 1961, so my chief complaint about the Gold Gloves is demoted to Assistant Chief here.

A superficial examination of Richardson’s career stats second base doesn’t really support the five Gold Gloves: according to baseball-reference (https://www.baseball-reference.com/players/r/richabo01-field.shtml ) he posted a career fielding average at 2B (.978), almost precisely matching the league average (.979) and his Range Factor per 9 innings (5.12) was just below the league’s figure of 5.21. In 1961 specifically, those figures are not far from that range: .978/.978 and 5.05/5.24. Still, he played 161 games at second base in 1961, and won his first Gold Glove.

Those aren’t great raw stats by any means (looking over the league, I’d say that Billy Moran deserved the GG in 1961) but I’d like to know how WAR makes Richardson into not only a defensive liability in 1961 but by far the 1961 Yankees’ single greatest defensive liability: of the 36 players they rank defensively, including pitchers, 21 rank at 0 defensive WAR or better, and of the other 15 Yankees with negative defensive WAR, 14 rank at -0.0 or -0.1. Only Richardson ranks below that: -0.9.

As a whole, the team ranked pretty well in defensive WAR, a team total of 5.4 Wins Above Replacement, so Richardson really stands out in that group for his defensive ineptitude. The three Yankees who score the highest in defensive eptitude are Richardson’s three regular infield mates, Boyer (with that whopping 2.9 dWAR), Kubek (1.3) and Bill Skowron (0.6), which you would think would make a butcher like Richardson stand out all the more, yet Richardson earned that infield’s only Gold Glove in 1961.

Since I haven’t tried to penetrate the shrouded secrets of how WAR is computed, and probably would mess up the math if I did, I will just leave these questions open for my betters to answer: on what basis does WAR make Richardson such a poor fielder in 1961? Why does it assess his 1961 fielding so much differently from Win Shares’ 1961 assessment? How come WAR makes Richardson into a positive fielder in 1963, his peak according to WAR, which is also his defensive peak according to Win Shares?

I’ll share some of the high points, such as I understand them, to see if anyone can come up a plausible explanation of Richardson’s atrocious dWAR in 1961, all taken from https://www.baseball-reference.com/teams/NYY/1961-fielding.shtml#all_players_advanced_fielding_2b . I’ll post the highlights that seem meaningful or comprehensible to me: the percentage of Richardson’s time at 2B when a right-handed batter was at the plate was 69%, exactly the league average, ruling out any sort of handedness bias, and his percentage of balls in play and of ground balls in play was also exactly at the league average figures. (Boyer’s comparable numbers are also virtually identical to the AL as a whole, the exception being that Boyer had 75% of Balls In Play compared with the AL’s 74%.) In the more mundane, non-advanced fielding stats, Richardson’s 1961 stats were just about the league standard, a hair below in both fielding percentage (.978/.979) and in Range Factor per 9 Innings (5.05/5.09). In comparison, Boyer at 3B was just a hair above league average in fielding percentage (.967/.966) and in RF/9 (3.78/3.73), which does indicate that Boyer had a better year with the glove than Richardson, perhaps, but not the night and day difference that WAR presents. Richardson also played over 200 more innings at 2B than Boyer played at 3B (though Boyer also played 75 innings at SS, with no particular statistical distinction). All in all, I’m still scratching my head over this one: what accounts for Richardson’s abysmal defensive rating in 1961 according to WAR?

Mind you, I’ve got no dog in this fight: I couldn’t care less if WS and WAR had some general disagreement on assessing fielders or hitters, or if they disagreed on the degree to which Richardson’s fielding stunk or excelled. It’s just that what’s going on here seems potentially instructive and, unless Richardson’s fielding is a one-off that never occurs again, this problem should be able to point to a failing in one methodology or the other.

Personally, I’ve come to think in recent years that Richardson was generally over-rated, benefiting beyond his contributions to his association with the Yankees’ dynasty, and with batting lead-off for the multi-championship teams. If WS agreed even partly with WAR’s assessment of Richardson’s fielding in 1961, I’d probably chalk that off (if I would have even noticed it) to a different emphasis in the two methodologies, but this is not a partial agreement or any kind of agreement, prompting me to want to examine this disparity more closely.

The WAR assessment is out of whack with several things: 1) Richardson’s Gold Gloves, 2) Richardson’s general reputation as an adroit fielder, which I can support beyond my own subjective impressions with his consistent "1" ratings in Strat-O-Matic, their highest rating, 3) his holding down the regular 2B job for better than five years with no one ever claiming the Yankees needed a defensive upgrade (unusual for a questionable fielder, especially in the New York press, who focus relentlessly on the hometown teams’ weaknesses), 4) his winning the job in the first place under Stengel, who insisted on excellent fielding, especially on the DP, from his middle infielders, 5) the 1961 Yankees’ W-L record, hard to feature with a truly brutal defensive middle infielder, 6) Richardson being virtually never substituted for defensively in late innings at 2B, which a butcher certainly would have been on a team that so often led late in games, 7)his raw fielding stats, and finally 8) his Win Shares rating.

That said, I’d be willing to entertain the notion that Bobby Richardson was an unacceptable fielder in 1961 if someone could explain the WAR rating in a way that makes some sense. Joe Posnanski’s column on Bill James’ antipathy to WAR (http://joeposnanski.com/more-on-war/ ) reduced WAR’s formulas to sophisticated doubletalk and mumbo-jumbo. (The nut quote from Bill is "They tend to get so far into the data, throw up their arms and make a wild guess.") There is a far more temperate follow-up piece by Posnanski (https://medium.com/joeblogs/even-more-on-war-493267e5d49 ) that I recommend strongly, taking into account a lot of the blowback his first column got from the gentlemen defending WAR, the nut quote from this column being from a fictional religious character in a Woody Allen movie who, pressed to the wall, says that he will "always prefer God to truth." I like that quote, because the religion:baseball analogy always appeals to me. I have always viewed baseball as essentially rational (and to my mind independent of any system of belief), but there is an element of beauty, of balance, of mystical justice in baseball that encourages others to find something almost spiritual there.

WAR and Win Shares exemplify the rational element to me, but at the heart of Posnanski’s followup piece is the part of WAR (and WS) that leaves open a small percentage of baseball that cannot be explained rationally. That percentage of the game, if I’m understanding Joe correctly, is 13%. That is, according to one of WAR’s defenders quoted in the piece

WAR accounts for about 87 percent of all runs in baseball. That’s pretty darned good and not atypical for the various WAR systems. Does that leave 13% of what happens on a baseball field unaccounted for? It sure does. But again, so what? WAR doesn’t pretend this variance does not exist; it merely refuses to punish individual players for the inherent volatility we enjoy seeing in the game.

Is Richardson’s 1961 defensive WAR simply an instance of the 13% of the game that WAR refuses to account for rationally? Is the team’s 13% "variance" concentrated in one player, somehow, while the rest of the team’s WAR makes perfect sense? Are we to chalk off Richardson’s poor WAR showing in 1961 to his bad luck that will presumably even out over the course of his career? Is it just some sort of weird anomaly?

I’m assuming that it’s not simply an anomaly. That is, that the explanation for this unaccountable assessment is more complex than "Hey, shit happens." I’d be disappointed to find that 13% of WAR’s findings are skewed and that WAR’s defenders are OK with that figure—the thing I came to WAR for in the first place, and Win Shares, perhaps naively, is some sort of all-purpose metric that yielded a close approximation of each element that goes into winning baseball games. My own definition of a close approximation is much more like 99% or 98%--if we’re really talking about a mere 87% accuracy rate, then I’ve been deceived in my expectations of either metric.

This puts me squarely in the group of fans who, in Joe Posnanski’s terms, "want to believe that [WAR] uses extremely complex calculation and reasoning to give us a wonder-stat, one that answers all questions and sees all worlds. Is that fair? No. But it is real."

I may be misconstruing the meaning of "WAR accounts for about 87 percent of all runs in baseball" but if I’m not, WAR, and maybe Win Shares too, seem far less reliable to me today than it did yesterday.

COMMENTS (22 Comments, most recent shown first)

Brock Hanke
Steven - Bill's long essay on the 1961 Yankees in on pp. 257ff of the New Historical, in the 1960s section. The basic claim is that the 1961 Yankees were not a great team, no matter how you look at it. It does not, however, have the quote you cited, so there must be another such essay somewhere. Just a couple of FYIs from the essay. If Bobby Richardson had taken any time off, the backup would have been Billy Gardner, who hit .212 in 99 AB, which probably explains why Richardson didn't take any time off. Gardner got those 99 AB as the backup to Clete Boyer at third, but was also the backup at 2B, if needed. Joe DeMaestri was the backup SS. He hit .146 in 41 AB. Hector Lopez was, essentially, Yogi Berra's platoon partner in LF, although he played a little at other places. The Yankees' LF defense had to have been bad. Berra was a 36-year-old catcher, neither tall nor fast, and Lopez was Hector Lopez. The essay does not address Richardson's defense except to say that he won a Gold Glove, and that the 1961 Yankees (Howard, Skowron, Richardson, Boyer and Kubek) had a spectacular infield defense as a team.
4:53 AM May 28th

Steven Goldleaf
And the French guy whose name escaped me was my old pal Blaise Pascal, though a lot of others have made similar apologies: https://quoteinvestigator.com/2012/04/28/shorter-letter/
6:48 AM May 24th

Steven Goldleaf
it is interesting that there is at least one other reader of Ehrman's blog here. Did you know, David, that he is a graduate of Lawrence (KS) high school and that Bill claims never to have heard of him?
6:29 AM May 24th

Steven Goldleaf
David--ah, but I don't charge you money to read the rest of the article as Ehrman does, so you can just skip all my redundancies. As some Frenchman wrote (only in French) "My apologies for writing a long letter--I had not the time to write a short one." Seriously--my apologies. My final edit usually catches most of the repetition, but I didn't do such a great final edit this time.
5:38 PM May 23rd

nettles9
Challenge The Yankees is back... https://www.facebook.com/ChallengetheYankees/
11:59 AM May 23rd

CharlesSaeger
A big issue with bWAR fielding is that it isn't transparent. I get the basic idea of the various versions of Total Zone, but we have no way of double-checking to see if there's bad data. Bad data can happen on a team level; the 1991 Braves are 100 runs below average with the best DER in the league. Pre-1988, I kinda trust the infielder ratings; it's a guarded trust, inasmuch that the mistakes even out over a few years. I don't trust the outfielder ratings, especially corner outfielder ratings.

(Why 1988 as a dividing line and the difference in trust? Well, 1988 is when Total Zone starts using hit location. Before that, it allocates hits allowed by a batter's outs. Say, Jim Rice had 13% of his outs go to second base, so when he gets a hit, the second baseman gets blame for 13% of the hit. For infielders, both hits and ground outs are pulled, so this kinda works. For outfielders, fly outs do not go to pull side but hits do, so I'm really skeptical there.)

As for Richardson, comparing the Yankee second basemen to the rest of the league might clear up a few things. They had 431 PO; reducing the league to the Yankee team's PO-SO gives us 400 PO, so they're +31. Assists are the opposite: the Yankee second basemen had 389 assists, and if I scale the league to the Yankee team assists, we have 447 expected, or -58. Since almost everyone ignores PO for infielders nowadays (other than for first basemen), that's the big number.

There's assuredly a lefty-righty issue: just scaling PA by LHB for the league gives us 2119 expected, but the Yankees only faced 1868. If I do AB-H-SO by LHB, the Yankees allowed 960 against 1371 expected. Giving the second baseman a tenth of extra outs against lefties as the difference, we have -17 assists.

Next errors and double plays. A typical fielding percentage would mean 19 errors, and the Yankee second basemen made 18, so that doesn't mean much. I figured expected double plays as team A/(1B+BB+HBP) divided by league total of the same, and that comes to 111. The Yankee second basemen got 140; that's a big boost.

Now, singles. We really don't care about doubles and triples, since those don't go past second base much. They typically go to left and right in the power alleys. (I've refrained from telling my daughter that the hard line drive she hit on Saturday would have been an easy double had she hit it to left for the same distance instead of just a hard single. That boy in left wasn't gonna catch anything while he was dancing around.) I'll ignore park effects and lefty/righty for this. Anyways, the Yankees allowed 900 singles, and per ball in play we'd expect 956.

So, what's my crude estimate? I'll give a run value of a seventh of each extra assist, half of each error saved, a sixth of each extra double play, and a twelfth of each single saved. That gives me +12 for Yankee second basemen. BB-ref.com gives them -8. That's a 20 run discrepancy.

Now, there might be a good reason why we have a 20 run discrepancy. For example, I didn't apply a park factor, when Yankee Stadium (according to Seamheads) had a 0.96 park factor for singles, which would lob off a run and a half or so. I could have included more data (like steals and reached on error) in the double play estimates. Having a record of each hit location would help matters.

Like I said above, the luck does even out over the course of a career. Richardson is +15 runs for his career; I have Yankee second basemen from 1957-1966 as -278 assists, +7 errors, +55 double plays, and +317 singles. The assists wouldn't be so far below average were I to apply a lefty/righty adjustment, but the singles wouldn't be so above average were I to apply a park adjustment. Applying the weights I gave above is a -0.6 runs, career, so -0 for an average year by this system and -1.5 by Total Zone. Evens out, more or less.
9:34 PM May 22nd

klamb819
Adding to what Mike137 said:

Win Shares has a 0-win baseline, and WAR's baseline is 47-2/3 Pythagorean wins (based on a runs created formula derived entirely from linear weights).

So for a team that wins exactly its Pythagorean expectation (with a 1.87 exponent), 143 Win Shares will equal replacement level. To determine a meaningful ratio of Win Shares to WAR, you would have to first subtract a player's share of those 143 Win Shares among all its players. I think the best way to apportion those 143 Win Shares would be on the basis of playing time, but that requires equating a position player's PAs or innings & a starting pitcher's IP & a reliever's IP. So an easier (but less precise) way would be to apportion them according to each player's percentage of his team's Win Shares.

Let's say RLWS is a player's share of those 143 Replacement-Level Win Shares. The formula for the ratio of WS/WAR, then. . .
= (WS-RLWS)/WAR. . .
. . . in the same way C° = (F°-32)*5/9

You also have to get rid of the difference between Pythagorean wins and actual wins. (Except even there, WAR doesn't always meet its own stated goal. The 2017 Padres, for example, had a Pythagorean expectation of 59 wins. But WAR credited them with 61.5 wins (47.7 replacement level plus 13.8 total WAR). So what you really have to get rid of is the difference between WAR wins and actual wins.

So instead of using 143 Win Shares as a constant, the number of replacement-level Win Shares becomes:

143+[3*(warWins — actualWins)]

Using the 2017 Yankees as an extreme example. . .
. . . 143 + [3 * (100.7 - 91)]
= 143 + (3*9.7)
= 143 + 29
= 172 is the number of Win Shares to be apportioned among the players to make WS comparable to WAR.

Or at the other end of the spectrum, the '17 Padres:
143 + [3 * (61.5 - 71)]
= 143 + [3 * (-9.5)]
= 143 - 28.5
= 114.5, rounded to 115.

As you can see, just by using Pythag wins instead of actual wins — forgetting all its other flaws — WAR had an error range last year of 57 Win Shares, or 19 actual wins.
5:17 PM May 22nd

Mike137
I think there is a fundamental error here. Consider temperature measured on the Centigrade and Fahrenheit scales:

104 F = 40 C, ratio = 2.6
86 F = 30 C, ratio = 2.9
68 F = 20 C, ratio = 3.4
50 F = 10 C, ratio = 5.0
32 F = 0 C, ratio = ???

It makes no sense to take a ratio, since different zero points are used. A replacement level team will have about 145-150 Win Shares, but zero WAR.
10:31 AM May 22nd

Riceman1974
Great piece. Defense is clearly where WAR loses me, as I've commented on many times before, and why I think Win Shares is better. The main difference in this particular case is that Win Shares heavily rates the double play for 2B and WAR doesn't. Bill belives the 2Bmen's no. 1 job defensively is turning two, and I have to agree with him, as does every GM and Manager since 1940 at least. But Bill credits them by their expected double-plays, and how far above or below expectation the TEAM is. Raw double-play totals are meaningless, so "double-plays minus expected double-plays" is the key stat. The 1961 Yankees led all of MLB with a plus 29.8 rating, meaning they turned-two 30 more times than expected. Win Shares basically says a team can't do this if their 2Bman is crap. WAR says the Yanks would have been better with a AAA scrub turning-two. Frankly, my dears, that is bullshit.
4:28 AM May 22nd

MarisFan61
Steven: I wouldn't guess it's likely you'd be pleased with any answer to a Hey Bill on that. Two reasons (both IMO of course): It's a strange comparison, and it's not very interesting.
I hope you'll send it, though, because I'd love to see what he does do with it.
1:29 AM May 22nd

joeashp
Hey, the board game was called "Challenge the Yankees". I used to play with my cousin; he was always the Yanks and I was always the All-Stars. (His game!). I remember it was a two dice game, with results from 2-12. Each card had a photo and signature; I remember that I could not read Hank Aaron's, thinking it said "Henry Rawn". Thanks for stimulating the memory!
7:38 PM May 21st

Steven Goldleaf
Kind of cries out for a "Hey Bill," don;t it?
6:58 PM May 21st

MarisFan61
My .02 cents (which BTW is less than 2 cents but probably not many would notice if I didn't say): :-)

That's not a thing on which many people would have an intuitive idea.

Despite having lived through, celebrated with and suffered with (respectively), and having read copiously about the '61 Yanks and '62 Mets, I haven't the slightest gut feeling on what you're asking there.

I could guess, as could anyone, but I wonder, is that a thing that anyone here has any feel about?
I'd guess not.
6:00 PM May 21st

villageelliott
A friend always told me that if one is looking for sympathy, it is in the dictionary between "$hit" and "Syphilis."

Actually, I am more chagrined at not including "tertiary."

If you could pitch that Yankee Killer Frank Lary every day...4-2 with an ERA of 4.34...
The Yankees would have a winning percentage of .333

5:57 PM May 21st

Steven Goldleaf
"Syphilis" didn't cut it for you, Elliot?

It occurs to me now that my wild guess about how the 1961 Yankees would have done in an actual game against the best players on all the other MLB teams put together, about .250, could be computed accurately--but only if we had a reliable system of judging each player's ability. I just guesstimated it would be about what the 1962 Mets did against the actual National League, but does anyone have a good reason to think it would be much higher or much lower?
5:26 PM May 21st

villageelliott
Anyone who uses "disputatious", "grok", "variance", i.e., "Hey, $hit happens" and "misconstruing" in the same article seems far more reliable to me today than yesterday.

Thank you for shining the light on that unaccountable 13%.

PS: How were you able to circumvent the censors on "$hit"? This may be the most impressive aspect of your extremely impressive article.
4:16 PM May 21st

MarisFan61
(looks like the [italicizing] isn't working)
12:31 PM May 21st

MarisFan61
Regarding how we should regard "WAR":

I [i]do[/i] regard it as a "close approximation" -- but without reliability. :-)

That's how I wish the whole world regarded it -- i.e. it's very good but not always. Never [i]assume[/i] it to be accurate for any given player, or for any given anything. Always realize that it might be off, sometimes even very off. Sometimes one of more of its components for a player might be accurate, but others not.

And, what's so wrong with that?? It still makes for a very useful tool.

And, maybe most importantly, let me ask: WHY WOULD ANYONE EXPECT MORE THAN THAT??
Why would you expect a thing that can't possibly take all aspects of how the game is played into account (anyone here think it possibly can?) and which can't either evaluate the things that it does take into account fully accurately .......why would you expect such a thing to be well indicative in all instances, and why would you be surprised that it's sometimes way off?

[i]That's the main thing, above all else, that sometimes puzzles me about what I see in sabermetrics: The [i]expectation[/i] of great accuracy and certainty in the findings.

That said, some methods are more reliable and more indicative than others -- and I find the Win Shares system more credible than the "WAR" system, both in its approach and in its results -- still with the caveats that I stated, but real real good.
As to why "WAR" is so much more widely known and used, I put it all on marketing and luck.

Can't help wondering if Bill doesn't mind that the presumably better system isn't being widely known and used, even though it's his own, because, if other teams aren't using it, who cares. And I wouldn't blame him; that's how I'd see it if I were in his role. That would probably be a reason for me to not market it very aggressively.

I know I've said a few mouthfuls here. :-)
12:30 PM May 21st

MarisFan61
Thanks for looking into it in detail.

Cliff's Notes, bottom line, the skinny:
The system is highly flawed and very arguably is not worth the following that it has.

It doesn't deserve this kind of attention either, but, good that people such as you do it sometimes, because it helps put the message across -- not that it gets across; why, I don't know.

Thanks for this work.
11:17 AM May 21st

doncoffin
(FWIW, looking at BBRef...it shows BR as -0.4 dWAR in 1961, +0.3 oWAR, and -0.7 WAR, which also seems odd to me.)

Just for kicks, I looked BBRef's leaderboards, looking at the Top 10 in MLB in defensive games, putouts, assists, and DPs. Here's how Richardson ranked:

G: 2nd (Jake Woods 162 games)
PO: 1st (tied with N. Fox)
A: 7th (and 5th in AL; AL leader was Chuck Schilling, with 73 more than BR)
DP: 2nd (1st in AL; behind some guy who played for the Pirates)
Total Zone Rating: Not in MLB top 10 (Schilling was the leader)
Fielding %: 6th (5th in AL; Schilling was 1st)
RF/9: Not in top 1p (Maz was 1st; Lumpe 1st in AL; BR : 5.05; #10 was Marv Breeding, 5.47, or about 70 more plays per 162 games. I actually don't think Breeding should be on the list-e played only 80 games at 2B)

In addition, the Yankees did not have a low-strikeout pitching staff, ranking 6th in MLB & 4th in the AL in K/9...so the Yankee fielders were not responsible for a disproportionately high number of balls in play. If anything, their pitcher's Ks look to have (slightly) reduced the number of plays for their fielders.

So I'd have t say that dWAR has some issues here. BR does not rank badly on any of the "counting stats defensive numbers, but does not do well on 2 of the 3 "analytical" stats. Still, to have him as a worse-than-replacement 2B (rather than a below average 2B, which I'm not sure he was), seems odd.
9:42 AM May 21st

MichaelPat
This is interesting; thanks for writing.

Maybe it's a double play issue, I thought. But no, NYY led the AL with 180, and Richardson was involved in 136 of them. (Over in the NL, Pittsburgh led with 187, with some guy named Mazeroski involved in 144 of them, in seventy or so fewer innings than Bobby.)

The only place Richardson scores badly is in Rtot, which "combines Rtz, Rdp, Rof and Rcatch into a total defensive contribution" all of which is way beyond my pay grade.

The Yankees as a team do very well on Rtot, scoring at a league leading +73; the average team was at +18. The Yankee total includes a -7 from Bobby.

I would also note that the Wins Above Average method used at BRef is really hard on second basemen. In the AL, only three teams got positive contributions at 2B (Wash, KC, Bos), and the average WAA was -1.2; Bobby's Yankees were at -2.8. It was worse for the NL, where only Milwaukee got a positive score at 2B (Frank Bolling in 1307 innings). The average team was at -1.3 at 2B; Maz's Pirates were at -0.5.

Something does indeed seem amiss here....
9:30 AM May 21st

ksclacktc
3-2-1....WAR defenders!

Stephen, there are so many of these examples of defensive anomalies I don't care to go into them anymore. I think there are a ton of possible reasons why certain variables could sway the ratings incorrectly. Not the least of which, is team positioning and team pitching styles of which no one can know with certainty. I hearken back to a day when Bill said that a statistic that is surprising (too often?) is probably wrong. Fans either believe in the defensive ratings or they don't. I don't, because even if they are correct in there appraisals 75-80% of the time, they don't matchup with the eye enough for my tastes. Hitting is well measured and utilized in Sabermetrics. The noise inherent in any stat just doesn't skew the answer in the extreme the way the very flawed defensive stats do.

Let us not forget about the opposite case of players given high WAR defensive ratings, but were not either regarded as or observed as strong defenders.
9:14 AM May 21st

In Defense of Bobby Richardson's Defense

COMMENTS (22 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: