Jack Kralick?

January 3, 2019
 

Jack Kralick?

 

              I’ve been doing some work on pitchers, since Christmas; have a long list of short articles that I am hoping to write based on the work.  But in the process of doing that work, I have noticed that, according to Baseball Reference, the leading American League pitcher in WAR for 1961 was. . . .wait for it, wait for it.   Jack Kralick.   Actually, I am not sure that Jack Kralick is the goofier one; it may be that the goofier one is Don Cardwell.   Baseball Reference says that the best pitcher in the National League in 1961, the best pitcher in the majors, even better than Jack Kralick, was Don Cardwell.  

              Now, I always liked Jack Kralick.  He was a pretty good pitcher, a lefty in the mold of Larry Gura and Gio Gonzalez; for that matter, I always liked Don Cardwell and Larry Gura and Gio Gonzalez, too, but I had a special affinity for Jack Kralick because I was listening to the entire game on the radio when he pitched his no-hitter, and he had a perfect game going until he walked George Alusik with one out in the 9th inning, and there had not been a regular-season perfect game in the majors, at that time, since 1922, and I knew this, and I knew that Kralick had a perfect game going although the announcers didn’t mention it, and I was SUPER into the game.   I was listening to history, man.  There were bottle rockets going off in my bloodstream. 

              Anyway, the notion that Jack Kralick was the best pitcher in the American League in 1961 would come as a surprise to Jack Kralick, if he was still with us, and as an absolute stunner to Mrs. Kralick.   We’re going to have to re-carve his headstone.  Whitey Ford went 25-4 in 1961 and was, at the time, generally regarded as the best pitcher in baseball.   Of course in the modern world we don’t rely on the Won-Lost record, so there’s that, and Kralick was pitching in a hitter’s park, so there’s that, and also, there was a real shortage of genuine top-flight pitcher seasons in 1961. Kralick was 13-11 with a 3.61 ERA.   He pitched 242 innings, which was 8th in the league, and gave up 257 hits, which was second in the league.   He gave up 21 homers, struck out 137 batters and walked 64.   These are all very ordinary numbers for him; actually, they are all very ordinary numbers for anybody.   There are a lot of pitchers who have those kind of numbers.   Why Baseball Reference selected Kralick—or Cardwell, in the National League—why B-R is nominating these gentlemen as the best pitchers in their leagues, not really apparent.  

              Here is a short stat summary of the best pitchers in each league in 1961. . .I’ll list the top 16 in each league, American League first:

 

First

Last

G

IP

W

L

WPct

H

R

ER

SO

BB

SV

ERA

Whitey

Ford

39

283

25

4

.862

242

108

101

209

92

0

3.21

Luis

Arroyo

65

119

15

5

.750

83

34

29

87

49

29

2.19

Frank

Lary

36

275

23

9

.719

252

117

99

146

66

0

3.24

Jim

Bunning

38

268

17

11

.607

232

113

95

194

71

1

3.19

Don

Mossi

35

240

15

7

.682

237

97

79

137

47

1

2.96

Ralph

Terry

31

188

16

3

.842

162

74

66

86

42

0

3.16

Juan

Pizarro

39

195

14

7

.667

164

73

66

188

89

2

3.05

Steve

Barber

37

248

18

12

.600

194

102

92

150

130

1

3.34

Bill

Stafford

36

195

14

9

.609

168

65

58

101

59

2

2.68

Hoyt

Wilhelm

51

110

9

7

.563

89

35

28

87

41

18

2.29

Camilo

Pascual

35

252

15

16

.484

205

114

97

221

100

0

3.46

Tom

Morgan

59

92

8

2

.800

74

31

24

39

17

10

2.35

Jack

Kralick

33

242

13

11

.542

257

101

97

137

64

0

3.61

Chuck

Estrada

33

212

15

9

.625

159

91

87

160

132

0

3.69

Mudcat

Grant

35

245

15

9

.625

207

118

105

146

109

0

3.86

Billy

Hoeft

35

138

7

4

.636

106

37

31

100

55

3

2.02

 

              Statistics at times can do an amazing job of hiding the underlying truth, but this may be a record.  I don’t know that I’ve ever seen a case in which "the truth" was THAT well hidden in the statistics, except possibly in the National League in the same year:

First

Last

G

IP

W

L

WPct

H

R

ER

SO

BB

SV

ERA

Warren

Spahn

38

263

21

13

.618

236

96

88

115

64

0

3.01

Jim

O'Toole

39

253

19

9

.679

229

101

87

178

93

2

3.09

Stu

Miller

63

122

14

5

.737

95

41

36

89

37

17

2.66

Joey

Jay

34

247

21

10

.677

217

102

97

157

92

0

3.53

Sandy

Koufax

42

256

18

13

.581

212

117

100

269

96

1

3.52

Johnny

Podres

32

183

18

5

.783

192

81

76

124

51

0

3.74

Lew

Burdette

40

272

18

11

.621

295

131

121

92

33

0

4.00

Bob

Purkey

36

246

16

12

.571

245

118

102

116

51

1

3.73

Don

Drysdale

40

244

13

10

.565

236

111

100

182

83

0

3.69

Mike

McCormick

40

250

13

16

.448

235

99

89

163

75

0

3.20

Stan

Williams

41

235

15

12

.556

213

114

102

205

108

0

3.91

Joe

Gibbon

30

195

13

10

.565

185

85

72

145

57

0

3.32

Don

Cardwell

39

259

15

14

.517

243

121

110

156

88

0

3.82

Jim

Brosnan

53

80

10

4

.714

77

34

27

40

18

16

3.04

Bob

Gibson

35

211

13

12

.520

186

91

76

166

119

1

3.24

Ray

Sadecki

31

223

14

10

.583

196

100

92

114

102

0

3.71

 

              Don Cardwell, the guy who is 15-14 with a 3.82 ERA. . . he is the best pitcher in the league, actually the best pitcher in the majors, according to Baseball Reference WAR.   

              So if we could re-do the Cy Young vote with the Jacob deGrom chorus in full voice, would Cardwell and Kralick come out on top?   We’re all pretty well used to surprises in the stats by now, right?  

              We’re used to surprises in the stats, but not THIS surprising.   If there is a statistical pathway toward the conclusion that Baseball Reference wants us to reach, it would seem to me that it would be a very narrow and treacherous path.   You would have to buy ALL of their assumptions.  I hope Tom or somebody can get on here and tell me either (a) yes, we really believe that Kralick and Cardwell were better than Ford and Lary and Bunning and Spahn and O’Toole and Koufax, and here’s why, or (b) Uh oh; somebody typed a "40" where there was supposed to be a "4" and caused all of the measurements to go haywire.   I’m a stats guy, you know?  I’m used to mistakes.   If it isn’t a simple mistake, I’m not quite sure I am ready to go that far. 

 
 
 

COMMENTS (97 Comments, most recent shown first)

bjames
From Mr. Rising

This point relates (I think) to a struggle that I'm having: How does one react to a person who holds views I find distasteful or wrong but who never really acts on them (aside from an occasional statement) and is, in fact, a pleasant, nice, responsible person? I think we all have relatives like that: people with whom we differ on important issues but whom we love.

------

I would argue that this indicates, in most cases, that you don't truly understand the thinking of the other person. I think we all "act on" our beliefs, whether we choose to or not. If a person "holds" beliefs that you see as objectionable but does not act on them, it must be that there is a wrinkle in his philosophy that you have not understood.


I find that friends of mine constantly, on a daily basis, will attack their political opposition for hypocrisy or for being hypocritical, when really all it is that those on Side A of the debate don't understand the thinking of Side B.


All political philosophies are constructed not on a SINGLE principle, but on a series of principles, one weighed against another; in other words, Liberals believe in principles L1, L2, L3, L4, etc, Conservatives believe in principles C1, C2, C3, C4, Socialists believe in principles S1, S2, S3, etc,


ALL political principles, without any exception, run into conflict with one another, forcing the liberal to choose between L1 and L2, the conservative to choose between C1 and C2. Resolving these conflicts is not "hypocrisy"; it is THOUGHT.
1:12 PM Jan 18th
 
MarisFan61
"Baked in" is pretty much exactly why that material about "men on base" and "leverage" seems to imply what I indicated -- and I'd love it for Tom (who's the one who really knows) or anyone else who understands it to answer what I've been asking.

It's like this: Kralick's fine numbers with men on base explain why he gave up so relatively few runs despite doing so atrociously badly with the leadoff hitter and despite putting so very many men on base (compared to the other pitchers we're talking about). To that extent, his good performance with men on base is already "bake into" his runs-allowed.

In explaining how Kralick comes out so well in this ranking, Tom went out of his way to point out that Kralick did very well with men on base, and then added that men-on-base situations have double the "leverage" of bases-empty situations.

That seems to mean that the method gives extra consideration to men-on-base situations.
Which would mean, as I've said, that it inappropriately gives extra credit for a thing that's already "baked in" by being part-and-parcel of the number of runs that the pitcher gave up -- and creates a possible perfect storm when the pitcher happens to have been awful with the leadoff man and put so many men on base to begin with, which happens to be the story about Kralick for 1961, and therefore would seem to be part of the reason for this odd result.

Could someone please address this.....
12:31 AM Jan 11th
 
George.Rising
Bill,

I agree about your importance of keeping foremost big-picture reasonableness above narrow-focused ideology in both sabermetrics and in politics. Quoting Mr. James: "Our political problems would dissolve into nothing overnight if people would simply realize that THEY are smarter than their political philosophies. All political philosophies, if implemented rigorously and systematically, would lead to horrific consequences for real human beings."

This point relates (I think) to a struggle that I'm having: How does one react to a person who holds views I find distasteful or wrong but who never really acts on them (aside from an occasional statement) and is, in fact, a pleasant, nice, responsible person? I think we all have relatives like that: people with whom we differ on important issues but whom we love.

Our society--especially social media--pushes the view that we must reject and "unfriend" all those with beliefs we find repugnant, regardless of how they actually behave on a daily basis. Conversely, we're encouraged to ignore the distasteful personal behavior of those we agree with--both Bill Clinton and Donald Trump come to mind on that one.
11:35 AM Jan 10th
 
tangotiger
Frank: excellent. Yes, stock valuation is interesting. In that case, the value of a stock is forward-looking, and so is about the "true talent level" of a player.

If they pay off a one billion $ penalty, that takes an immediate hit to the price of the stock (to the extent that it wasn't already priced in), but a month later, that's no longer part of the valuation. It's baked in.

So, it does have some analogy, but the way WAR has been approached is backward looking not forward looking.
8:39 AM Jan 10th
 
FrankD
All: great discussion. I totally agree that we'll never generate the "One Stat" that explains all of baseball. Look at investing. How many Stats have been invented that explain the true value of a stock. P/E? P/E/earning growth? P/Sales .... all give a different picture of the value of a stock. And I dare say many more people have been searching for the Holy Grail of investing than people searching for the "One Stat" for baseball.

All we can do (as Bill and others have said here) is hypothesize a model, run the model and review the results for glaring errors, publish the model and the results, and then adjust the model as more information and/or critiques warrant ..... this method has gotten us from before Newtonian Physics to Relativity and Quantum Mechanics - and we still don't have a unifying theory of Physics but we are getting closer and closer. Note the we presently have two models: Relativity for Cosmology and 'big' things and Quantum Mechanics for subatomic and 'small' things.
10:51 PM Jan 9th
 
tangotiger
So, how do we blend the three components of Season Score to create Quick WAR?

This is what we have so far:
Quick_WAR1 = (1.275*IP-ER-R)/20
Quick_WAR2 = (8*W-5*L+SV)/13
Quick_WAR3 = (K - BB) / 5 / 7.4

Let's say we weight them as 49.5% for WAR1, 32.2% for WAR2 and 18.3% for WAR3.
We'd get this:
Quick_WAR
= 0.50 * Quick_WAR1
+ 0.32 * Quick_WAR2
+ 0.18 * Quick_WAR3

= 0.495 * (1.275*IP-ER-R)/20
+ 0.322 * (8*W-5*L+SV)/13
+ 0.183 * (K - BB) / 5 / 7.4

= (1.275*IP-ER-R)/40.4
+ (8*W-5*L+SV)/40.4
+ (K - BB) / 5 / 40.4

= (1.275*IP-ER-R)
+ (8*W-5*L+SV)
+ (K - BB) / 5
all divided by 40.4

= Season_Score / 40.4

There you go, to convert Bill's Season Score into Quick_WAR, you take is Season_Score and divide by 40.

And most importantly, it has an implied weighting of roughly 3/2/1 in the runs term, the wins term, and the SO/BB term.

Bill could have easily created three separate Season Score, and let each of us blend them with our own weighting. But HE blended them to give us what he did. Which is fine. But it limits us, because it's not clear how all this is derived.

Now you know, and now you know how to reweight them based on what you think it should be.


9:37 PM Jan 9th
 
tangotiger
Finally, using just K and BB.

Quick_WAR = 0.27 * (K - BB) / 10
= 0.027 * (K - BB)
= 0.2 * 0.135 * (K - BB)
= (K - BB) / 5 / 7.4

This is his third component:
(K-BB)/5

Do you see it? We can do Quick_WAR on Season_Score_Component3 as:
(K - BB) / 5 / 7.4


9:25 PM Jan 9th
 
tangotiger
This is how to do Quick_WAR with a W/L record:

Quick_WAR
= W - 0.385 * (W+L)
= 0.615 * W - 0.385 * L
= 13/13 * (0.615 * W - 0.385 * L)
= (8*W - 5*L) / 13

This is his second component:
8*W-5*L+SV

Do you see it? We can do Quick_WAR on Season_Score_Component2 as:
(8*W-5*L+SV)/13

9:18 PM Jan 9th
 
tangotiger
Just to show you what I mean, we can construct Quick_WAR based on the three components of Season Score.

First, this is Quick_WAR:

www.tangotiger.com/index.php/site/article/quick-war

This is what rWAR looks if we use Quick_WAR:
0.058*IP-ER/10

Which we can rearrange as:
(0.58*IP-ER)/10

Or as:
(1.16*IP-2*ER)/20

Or:
(1.16*IP-ER-ER)/20

Therefore, we can convert Bill's component along the same lines. This is one component of Season Score:
1.275*IP-ER-R

Do you see it? We can do Quick_WAR on Season_Score_Component1 as:
(1.275*IP-ER-R)/20

And we can do something similar to each of the three components of Season Score. And blend them in.



8:59 PM Jan 9th
 
tangotiger
Bill:

Exactly. This is why I like that we have Fangraphs and Baseball Reference having such a different core basis. One starts with the output, and the other with inputs. And they are trying to look at the same thing.

As an analogy: you want to know what a door looks like. And it might look differently if you are inside the house than outside the house. There might be some patterns only visible outside and other patterns only visible inside. Or maybe there's something hanging outside the door that you can't see from inside.

If let's say both of them approached the problem similar to say Season Score, where you consider all the components and give them strong weights, then we're kinda locked in.

While Bill James knows he can create a NOWOL version of Season Score, because we realize that many times we have zero use for W/L/SV, it's not apparent to the common user. He might even be scared to remove W/L/SV from Season Score altogether.

As long as the method is clear as to its driving force, then the user can decide to weight fWAR and rWAR however he thinks it should be weighted.

2:13 PM Jan 9th
 
bjames
I believe Steve's comment is correct in every respect.

The complexity of the real world always does and always will exceed the complexity of our statistical images of it. THere are hundreds, maybe thousands of factors, that you COULD consider in evaluating a pitcher/season. We sort through them, decide which ones are important, assign each one a weight. But the resulting model is far more simple than the underlying reality, and thus sometimes will be wrong.
12:40 PM Jan 9th
 
MarisFan61
Steve: Before we agree to lump all the major metrics into 'none don't generate anomalies,' I wonder if anyone would like to cite any anomalies coming from Win Shares that are quite on the level of this one.

Mind you, I don't know that there aren't. I'll be highly interested to know if there have been, and what they may be -- not just because it might affect how any of us might look comparatively at Win Shares and "WAR" but also because, like this Kralick thing, might point up specific issues about the system. (As far as I've ever seen and can remember, I don't recall any major result of Win Shares that's anywhere near the order of this one.)
10:46 AM Jan 9th
 
tangotiger
Steve:

Perfect.

10:14 AM Jan 9th
 
steve161
Correct me if I'm wrong, but it seems there is a consensus to the effect that the RA/9 approach to pitching WAR is flawed to the extent that it treats all pitchers as equally affected by team defense. In other words, it is an oversimplification.

The FIP approach to pitching WAR is also seen as flawed, in this case because it ignores the pitcher's responsibility for balls in play (or alternatively for sequencing). In other words, it too is an oversimplification.

Tom's way of dealing with this is to average the two WARs, which feels to me like the old gag about the two buckets of water. I know Tom is busy with other things, and I'm all for it, because I think Statcast is the best thing since peanut butter. But somebody needs to address this, to treat the various WARs as works in progress, rather than finished products.

Bill on the other hand deals with it by reminding us of the existence of Win Shares, though he concedes that it is devilishly hard to calculate. Nor does he seem interested in devoting more time and effort to finishing it. Not surprising, as he too has other fish to fry.

I'm skeptical that any Great Stat can be so refined that it doesn't generate anomalies, because baseball is wonderfully complex, especially the defensive side of it. My biggest gripe with WAR is not the framework itself, but the ease with which so many people cite that one number as if it were all we need to know. I believe it is capable of further refinement, though I doubt it will ever be the only answer we need.
6:16 AM Jan 9th
 
MarisFan61
......Supplementing the data on numbers of errors:

While the error data suggest much better fielding behind Kralick, BABIP data don't.

Twins pitchers overall BABIP-against:.280
Kralick BABIP-against: .301
Twins pitchers minus Kralick: about .276
12:31 AM Jan 9th
 
tangotiger
Bill:

Right that's exactly correct. WAR5050 if you like.


11:53 PM Jan 8th
 
bjames
Bill:

Right, which is why the step I've taken is to mitigate the issue by going with 50/50 Fangraphs / Baseball Reference.


But in doing this, are you in effect creating a new metric--r-fWar, perhaps? It seems to me that you are.
11:13 PM Jan 8th
 
MarisFan61
I don't at all understand the idea that if one is not offering a competing metric he has to accept the result to ANY extent.

Please, anybody, just stand back from it for a second and see if you don't see the absurdity of it.

All the time, every day, all of us see evaluations of whatever kinds of things (about baseball and not about baseball) in areas where we don't have a competing method of evaluation and never will, in fact usually in areas where we have no capability for developing our own method of evaluation. It doesn't at all mean we have to accept the result to any extent at all, and very often we don't and we feel we have good reason not to.
10:37 PM Jan 8th
 
tangotiger
Bill:

Right, which is why the step I've taken is to mitigate the issue by going with 50/50 Fangraphs / Baseball Reference.


10:32 PM Jan 8th
 
bjames
My problem however is that if I'm not going to offer a competing metric, then I have to accept the results to some extent.


The key phrase being "to some extent". Sure, we all have to stand by our new measurements to some extent, or throw them away. I'm just saying that it is a mistake to stand behind them ABSOLUTELY, thus failing to recognize that, at some point, another step forward will be available to us.
10:27 PM Jan 8th
 
MarisFan61
BTW, I gather it was more of a theoretical musing than an actual question, but I do have the data about errors by the team overall and with Kralick.

Overall, the Twins made 174 errors in 160 games, 1.09 per game.
In Kralick's games (all of which were starts), 26 errors in 33 games, 0.79 per game.
(So -- in the non-Kralick games, 148 errors in 127 games, 1.17 per game.)
10:25 PM Jan 8th
 
tangotiger
Bill:

Right, I agree that it is one of the shortcomings of rWAR (WAR from baseball Reference) that it gives a similar fielding adjustment to every pitcher on the same team.

This is something that I've railed against every year, and every year I find an example. I did so back with Verlander/Tigers in 2011 or 2012. I did so again this year: Nola and the Phillies. Nola had a .251 BABIP, and the Phillies fielding was well below average, but he gets the same fielding adjustment as all other fielders.

So, I think you can make an even more powerful argument against rWAR, and more relevant for most people, by simply dealing with 2018. Especially since we have even more data for 2018 to further support the argument.

My problem however is that if I'm not going to offer a competing metric, then I have to accept the results to some extent.




9:50 PM Jan 8th
 
bjames
So, if we are going to have a discussion about Kralick, the first thing I would say: would your conclusion change if he was 23-2 instead of 13-12 (ceteris paribus)?




-----

I am quite certain that I would not. The critical issue is not won-lost record; it is how to interpret defensive support, and how to implement that interpretation in the analysis.


Let me suggest a step forward, and maybe we'll get to it in another generation. The system is assuming that Jack Kralick's defensive support is the same as the rest of his teammates--that is, very poor. But the '61 Twins had four starters, all pitching 200+ innings. Pedro Ramos was charged with 18 un-earned runs, Camilo Pascual 17, Jim Kaat 18, Kralick only 4. This would strongly suggest that the Twins committed far fewer errors with Kralick on the mound than with the other pitchers.


Let's assume that that is true. Well, we can get to that. Given the data we have now, we can certainly calculate how many errors were committed behind Kralick, how many behind Kaat, how many behind Pascual, etc. Thus, making a TEAM LEVEL adjustment on that issue is not the ultimate or perfect finish line for that adjustment.


Similarly, with modern data we could certainly calculate or otherwise research balls in play against each pitcher, and figure the Defensive Efficiency Record against each individual pitcher. Again, incorporating THAT data into the evaluation would almost certainly result in a more accurate evaluation, would it not?
9:24 PM Jan 8th
 
MarisFan61
I certainly agree that Tom (and anyone else) doesn't need to break down everything.
I've tried to present clearly what I wish he'd do (he or anyone else who knows enough about "WAR" to see within this odd result) to say what is/are the main elements of the system that vault Kralick ahead of the more obvious contenders. We've been trying to get at it here, also on Reader Posts. As we've said, we know 2 significant factors: Park Factor (which of course is in any capable major metric) and the counting of unearned runs equally with earned runs. But, as a number of us have kept noting, there's still a gap, and I'm not talking at all about the 'credibility gap' that led to this article, but only about the gap that still exists about understanding what it is about "WAR" that vaults Kralick up so far. I don't think anyone questions the basic use of Park Factor, and I doubt there's any big issue about the specific way that WAR does it (including that this is unlikely to account for any great difference between how WAR and other metrics see Kralick). About the unearned runs thing, many would quibble about what WAR is doing (at least the baseball-ref version, which is what kicked this off) and when all is said and done, I think this will probably be a major bone -- but it's unrelated to what I think we're reaching for now, because we already know about it being a factor in what we're trying to understand.

We're looking for what major thing/things account for the rest of the gap. As I've said, it seems that one of Tom's comments contains a big thing although it sort of buries the lead. It appears that the comment (the one of 8:16 AM Jan 5th, the one that starts "This is what is perplexing the saber community" and maybe understand this aspect better than I am) indicates that the system basically gives Kralick extra credit of sorts for having been real good at putting out fires that he happened to create a huge quantity of, without any penalty for having created them or for the ones he didn't put out -- i.e. by giving extra credit for good stats with men on base, although it didn't result in any real impressive low number of runs given up because of how many men he put on base in the first place. If I'm right about that, then indeed we've accomplished something basic about the WAR system: we've identified a flaw, one that's easily fixable or at least improvable. (If that's not what Tom's comment is saying, it's hard to see how that thing applies to Kralick's WAR showing at all. I invite Tom or anyone else who knows enough about WAR to acknowledge or clarify this.)
9:03 PM Jan 8th
 
tangotiger
This is getting into a discussion of how to disagree!

There are presumptions I placed there:

1. Baseball Reference is thoughtful and careful
2. Its presentation is thorough and comprehensive
3. They implemented a framework that I myself created or otherwise championed (I gave the blueprint of a house, they build the house with their materials and sweat)

On that basis, I am saying that if you reject it, there's enough there for someone to say why they reject the conclusion. There is a discussion to be bad, there are ways to move the discussion forward. We don't have to start from the beginning.

When I disagree with Bill James for example, I have to come fully equipped. That's because I know Bill is meticulous and thoughtful. There are likely reasons he did things that I didn't figure out yet. But the parts I did figure, I can show why I disagree.

***

So, if we are going to have a discussion about Kralick, the first thing I would say: would your conclusion change if he was 23-2 instead of 13-12 (ceteris paribus)?
8:52 PM Jan 8th
 
bjames
Well. . .don't browbeat Tom to try to make him break everything down decimal point by decimal point. He's kind of living in a minefield, trying to be polite to all sides of the argument; you're picking on him for it.

I disagree with the view that if you disagree with a conclusion, you have a responsibility to say why it is wrong. We are much smarter than our formulas; therefore, our intuitive interpretations of the facts we know should be given a weight at least equal to the conclusions of a system. Let us say that an analytical system were to argue that Jed Lowrie was the Most Valuable Player in the American League in 2018. I can disagree with that without acquiring or accepting any responsibility to explain WHY it is wrong. I can simply say that I don't accept that conclusion or that my analysis does not coincide with it, and I'm going to move on.

People have done this to me for 40 years, right? Many, many people never read my stuff; they simply rejected it out of hand. That's fine. It doesn't do any harm. If you have something worth saying, it will get through over time.
8:36 PM Jan 8th
 
bjames
Well. . .don't browbeat Tom to try to make him break everything down decimal point by decimal point. He's kind of living in a minefield, trying to be polite to all sides of the argument; you're picking on him for it.

I disagree with the view that if you disagree with a conclusion, you have a responsibility to say why it is wrong. We are much smarter than our formulas; therefore, our intuitive interpretations of the facts we know should be given a weight at least equal to the conclusions of a system. Let us say that an analytical system were to argue that Jed Lowrie was the Most Valuable Player in the American League in 2018. I can disagree with that without acquiring or accepting any responsibility to explain WHY it is wrong. I can simply say that I don't accept that conclusion or that my analysis does not coincide with it, and I'm going to move on.
8:33 PM Jan 8th
 
MarisFan61
re Tango: "If someone chooses to disregard a conclusions, while the model ALSO spits out all the assumptions, then the reader is really required to say which of those assumptions are invalid." --

If you've been paying attention, you know that I've been busting my butt trying to do exactly that.
You haven't made it easy by declining to say, with the benefit of all you know about "WAR" which is more than any of the rest of us know, exactly what are the elements of the system that give this odd result.
We've identified two of them. One of your comments seems to imply another, but you did a post on "Reader Posts" seemingly denying that (although you didn't say it directly).

If you want to contribute meaningfully to what we're trying to do here, you'll help us out a little more.
8:03 PM Jan 8th
 
tangotiger
If someone chooses to disregard a conclusions, while the model ALSO spits out all the assumptions, then the reader is really required to say which of those assumptions are invalid.

The WAR from Baseball Reference is a thoughtful and thorough system, with all of its assumptions laid out in a very transparent manner. I think its incumbent on those who reject the Kralick output to say which of those assumptions in the model are unreasonable.

Having said that: I don't believe in the core of a system being based on RA/9 and "backing out" all the externalities like park and fielding. I also don't believe in the core of a system like Fangraphs that starts with FIP. The true answer must be in-between. So, I would say, at the least, generate the list of 1961 pitchers based on a 50/50 approach.

Also: neither system looks at W/L (or saves), so showing the W-L record as a data point to compare against will simply create an extra obstacle to account for.

Finally: NOWOL Season Score is made up of three components: the run portion, the FIP portion and the "Component ERA" portion. If you were to release it as SS1, SS2, SS3, you would get three different results. And on occasion, one will look horrendous. Which is why we don't have THREE distinct version of Season Score, but a merge of those three components. What it's doing under the hood is hidden to some extent.

The real issue is that rWAR and fWAR are two distinct WAR models that stand alone on their own. That's the unfortunate part. But, if you do (rWAR+fWAR)/2, then we basically aren't really having this conversation, because we would have no issue.




7:10 PM Jan 8th
 
bjames
Maybe I didn’t quite get to what I was trying to say a few minutes ago. Choosing to not believe the outcome of a formula or set of formulas that you have created is a recognition that the real world is more complicated than your formulas have adequately represented. It is a way of saying that “I know things that I have not yet figured out how to put into the system.” This is absolutely appropriate, because in fact you DO know things which you have not yet figured out how to put into the system. It is inappropriate to refuse to do this, because it is false, and it is false because it implicitly says that my current formula is smarter than I am, when in reality your formula can NEVER be smarter than you are; the formula is merely a way station on the road to some better set of formulas.

There is a political import to this realization. Our political problems would dissolve into nothing overnight if people would simply realize that THEY are smarter than their political philosophies. All political philosophies, if implemented rigorously and systematically, would lead to horrific consequences for real human beings. You may believe in strong border security, but still realize that something has gone horribly wrong when children are separated from their parents at the border and not returned to them. Or you can believe in open-borders immigration policies, and still realize that something has gone horribly wrong when an illegal immigrant, released from custody after a previous arrest, murders a policemen. The insistence that our sets of ideas, our intellectual constructs, are somehow smarter than we are drives us toward irrational extremism in politics, just as it could in the mathematical analysis of baseball.

4:19 PM Jan 8th
 
bjames
tangotiger
You can NOT start picking and choosing which answers you like.


I would argue that in fact you have a responsibility to pick and choose the answers you like, that all scientists in fact do this, and that it is a normal part of the scientific process.


Science relies on a process of simplifying complex problems so that you can create models or images of them. Sometimes you can reduce problems to extremely simply models of them which provide clear and convincing evidence about the complex issue. But 95% of the time, you can't. 95% of the time, you create models which provide satisfying answers and convincing answers SOMETIMES, and do not do so other times.

The question then becomes, "What do you do when your model produces an answer which reasonable people do not accept, and which you choose not to accept." The WORST thing that you can do in that situation is to insist that your model is right and your intuitive logic is wrong. The right thing to do is exactly what Tom says (here) that you can't do: choose not to believe the answer provided by the model.

When you choose not to believe the answer provided by the statistical image of the problem, then you face the question "What did we miss? Why did my system, which I believe in and which usually works. . .why did it not work THIS TIME? What ELSE do I need to do, to move on to a better model?

Asking that question is essential to scientific progress. All scientists do it; all people devoted to knowledge do it. Tom does it all the time. It is absolutely appropriate.

3:41 PM Jan 8th
 
bjames
Responding to Stephen Goldleaf:


I wonder how much of this questioning of Kralick and Cardwell derives straight out of the fact that they never became (or had been) dominating pitchers. What I mean is, if Koufax in 1959 or Spahn in 1965 were shown to have dominated the NL in those heretofore-undominating years, would we just accept it as being consistent with the rest of their careers, and would we easily agree to accept the "new" findings.



An example of this is that I actually have done different lines of analysis which argue that Bob Gibson in 1962 was the best pitcher in the National League. Gibson was like 15-13, 2.85 ERA. . . .not superficially equal to Drysdale. It has SOME extra credibility because it's Gibson, who was later proven to be a superior pitcher.



3:32 PM Jan 8th
 
George.Rising
Re: Kralick 1961 vs. deGrom 2018

Building on what Dan (DMBBHF) wrote below: It's easy to see that deGrom had great stats in 2018 in all categories--except for W-L. It's an easy "sell" to recognize his greatness that year. despite his mediocre W-L record. By contrast, it's not an easy sell for Kralick's 1961 stats.

If we go deeper and analyze deGrom's W-L on a game-by-game basis, we find that the W-L is unrepresentative of all of his other stats, particularly runs and ER. In his 9 LOSSES, deGrom had 3 or fewer runs and 3 or less ER each time he lost! His RA was 3.14 and his ERA was 2.71 in his LOSSES. In his 14 NO DECISIONS, he gave up 0 or 1 run 10 times! Plus 3 twice and 4 once. His RA and ERA were both 1.62 in his NO DECISIONS. And, of course, in his 10 WINS, he was remarkable: 3 or less runs every time, and 2 or less ER every time! 1.40 RA and 0.89 ERA in WINS.

In sum, that is plenty of evidence to argue that deGrom's W-L was fluky (fluky-bad) and not representative of how well he pitched based on all of his other stats, especially the key stats of runs and ER. Based on those, he probably should have been 20-5 or something like that.

By contrast, Kralick's game-by-game analysis does not lead us to a conclusion that his 13-11 record in 1961 was as fluky-bad. He gave up 4 or more ER in 14 starts. This includes giving up 8 ER in one loss and 5 ER in three other losses. He also gave up 6 ER in a win and 6 ER in a no-decision. However, he did lose 5 games where he gave up only 2 or fewer ER.

I understand (and strongly support) adjusting for park and for the year's hitting context. But even those adjustments would not close the gap between deGrom's 2018 and Kralick's 1961. Overall, he had an above-average year, and 13-11 might underrate him a bit--but nowhere near deGrom in 2018.


3:03 PM Jan 7th
 
MarisFan61
I'm afraid that some of our members, the ones I'm (sort of) arguing with, might be viewing these comments as a nuisance.

Granting my bias about my own comments, let me say, that would be a mistake.
I'm helping toward an understanding of this odd finding -- and it's even possible that what I'm going to say now will help toward a refinement of the "WAR" system.

Repeat: may help toward a refinement of the "WAR" system. :-)

As we continue talking about the oddity of the pitcher who shows as the top pitcher in the league having such an unimaginably God-awful split against leadoff batters, and thinking more about Tom's having done that post about Kralick's record with men on base, I had to wonder, does the system (at least as used by baseball-ref) give special consideration to performance with men on base, over and above its obvious consideration of runs given up?

I guess it does.
And, looking back at that comment of Tom's, the one of 8:16 A.M. Jan 5th (which I mention to help anyone find that comment if you want to), we see this:

".....the Leverage Index with runners on is 2x that of bases empty."

I think we've just discovered the missing link. :-)

And it's a thing that I think the WAR people might be well advised to reconsider in how they do their metric -- unless the leadoff situation is given extra "leverage" over other bases-empty situations.
(I'm pretty much assuming it isn't.
Why??
Because of this Kralick season.
If leadoff situations were included, I don't think it's possible that Tom would be letting himself talk about Kralick's fine performance in leverage situations.)

As Bearbyz said, very arguably 'leadoff' IS a high-leverage situation, inherently.

We've already identified Park Factor and the unearned runs thing as two major factors in vaulting Kralick ahead.
But there was still a gap.

It's looking like it's quite possible that most or all of that gap is an excessive regarding of performance with men on base, and (perhaps) an ignoring of the importance of doing putridly with the leadoff man.

So, it could be that this season of Kralick is a 'perfect storm' for creating a bad result -- but, not necessarily a bad result that's just an aberration but one which points up issues about the "WAR" system that deserve to be addressed.

We know that there's disagreement about whether it's a good idea to consider unearned runs equally as earned runs. Maybe they want to re-think that. I have no strong feeling on that; I don't know which way is better.

We know that some people have doubts about whether it's good to use Park Factor as the metrics do.
I'm not struck that there's any issue about how WAR or any other metric uses Park Factor. I think they do fine on that.

So, Kralick gets bumped up because his runs-allowed numbers are better than they look, because of Park Factor.
No problem.
He also does better in this system than it might look like he should from his ERA or ERA+ because he gave up so few unearned runs.
OK, I suppose; arguable but interesting and let's say OK.

But: If he's getting bumped up further because some versions of WAR give extra emphasis according to 'leverage' but no particular great weighting to leadoff, that's just nuts, and it needs to be fixed.

Repeat: Needs to be fixed. :-)

And, in case this isn't clear, I say it least of all to fix this ranking. I liked Kralick and, to tell the truth, got a kick out of seeing him coming out #1.
It's to fix the system.
The Kralick result, if it's due in part to that thing about leverage (and if the details of it really are as I've said), serves to point up a poor aspect of the system. That's what's important about it.

Perhaps this aspect of the system works OK most of the time. I guess it does or else there would have been larger "elephant tracks" to the contrary.

But, in order to see that it's an absurdity beyond just this Kralick result, just think of it in the theoretical terms:
The fact of a pitcher doing well with men on base hardly adds anything beyond what we already know from his runs-allowed, and his "leverage" assessment isn't real complete or real relevant if you're ignoring the leadoff situations.​
9:12 PM Jan 6th
 
MarisFan61
.....In terms of A.L. players of that year, closest is:
Al Kaline .324/.393/.515

Kralick's leadoff hitters of the inning were basically Al Kaline with a few less walks.
5:05 PM Jan 6th
 
MarisFan61
For that slash line:
Closest I see, in terms of career record, is Goose Goslin.
Or, George Brett, if you add some hits and a fair extra amount of power.
That's what leadoff hitters in the inning did against Kralick that year.

Against such a background, he needed to have good success with men on base in order to survive, in order to not give up huge numbers of runs. It's not an explanation for why he does better on a metric than you'd think from his numbers of runs given up.

Sorry to harp on this leadoff man thing. I'm doing it because those other guys, unaccountably, are completely ignoring it.

In fact, y'all probably wouldn't even know about this extremely interesting split -- especially for a pitcher who comes out as #1 on something like "WAR" -- if not for my persistent obnoxiousness.
4:32 PM Jan 6th
 
MarisFan61
Bear: Well said. (Although I'm not sure those other guys will think so.) :-)

But while that depends on what we mean by "clutch," the essence of it for what we're looking at here doesn't. The fact of a pitcher having good numbers with men on base isn't any kind of selling point when he's putting so many man on base to begin with, and when he turns his average first hitter of the inning into....well, let's make it a little exercise:
What hitter in history does this slash line most resemble?
(It's not a quiz, there's no right answer, and I don't know yet who I'd say.)
.325/.373/.524

Sure, it helped limit his runs-given-up.
But it adds absolutely nothing to our discussion, because we already knew about his runs given up, including the very low number of unearned runs which helps compensate for a higher E.R.A. than these other pitchers, and adjusting for Park Factor also helps him.
What we needed (and still need) are things that would supplement this and helps explain the ranking.
The "runners on base" material was astonishingly irrelevant.
4:17 PM Jan 6th
 
bearbyz
I look at facing the lead off hitter in an inning as a high leverage situation, so I don't think Kralick was that great in the clutch.
3:45 PM Jan 6th
 
tangotiger
jgf: yes, that's a good synopsis.
9:09 AM Jan 6th
 
MarisFan61
I looked at a few things relating to Kralick's low number of unearned runs, a series of questions I thought of, with some work done to answer them. I was going to post it here but it wound up being too long even by my standards :-) .....so instead I'm posting it on Reader Posts. If you're interested and are inclined to look there, you won't have any trouble finding the thread.

If anyone wants me to put an short version of it here, just say.
I ought to add, though, it's not like it necessarily goes anywhere, doesn't necessarily shed any insight on the WAR ranking, just a series of questions and answers.
12:11 AM Jan 6th
 
MarisFan61
JGF: I didn't imagine I was duplicating BB-Ref calculations or anything else. I wasn't interested in that and couldn't do it if I tried.

All I've been doing in regard to that is trying to discern (or, better, have someone just say, in a plain English nutshell) what conceptual things within it might significantly account for Kralick ranking how he does.

And, as I said, I see that Park Factor and the counting of unearned runs equally to earned runs do account for a lot of it, but with a gap still remaining.

-----------------

Let me add -- I think I said an abbreviated version of this before -- this thing I keep harassing about, 'Please put it in a plain-English nutshell what the things are in the method that give this surprising result'.... It's nothing more than what Bill James routinely does when his methods give a hugely surprising result. It's routine for him, and to me it's a huge appeal of his work. In fact, I'd say it also very much reflects what good about his work. He's always thinking "what does this mean, where's it coming from, what's it about, what's going on." It's never just numbers that get spit out and there you are, folks, that's that. He wonders about them, just like he undoubtedly wonders about those aspects all the way through developing and refining whatever system -- so it's just natural, when there's a seemingly odd result, to wonder and to say what about the innards of the method made this happen.

When I see an absence of that kind of thing from other methods, I can't help getting a bit of a sinking feeling about the method, not just because the lack of good comment is frustrating but because I can't help thinking that it reflects a relative lack of that kind of thinking in the construction of the metric.
12:05 AM Jan 6th
 
jgf704
Also, Maris, if whatever calculations you are doing end up showing that Ford is better than Kralick, then you aren't correctly duplicating the BB-Ref calculations.
10:41 PM Jan 5th
 
jgf704
Part of the confusion here, IMO, is that we are discussing 2 different things:

1. Why rWAR and fWAR give different numbers (and relative rankings).
2. Why rWAR gives results counter to intuition.

I think many of Tango's comments have addressed #1, while I think many others (especially Maris) are concerned with #2.

This has been a fun discussion for me, as it has resulted in my gaining a deeper understanding of the details behind both rWAR and fWAR, as well as some of the higher level differences that Tango has been discussing. And to restate his points (through my own lens):

* fWAR assumes identical sequencing of events and identical defense for all pitchers. Moreover, the "identical defense" assumption is a fairly deep one -- it means that balls in play result in the same distribution of results (outs, singles, doubles, triples) for all pitchers.

* rWAR includes the effect of sequencing, but tries to remove the effect of defense.

Both account for park effects.
10:39 PM Jan 5th
 
shthar
Gonna go out on a limb and bet that no-hitter was against the A's.


6:57 PM Jan 5th
 
MarisFan61
JGF: About RA9avg being "pitcher dependent," I do see what that means.
5:16 PM Jan 5th
 
MarisFan61
....In terms of the basic "what is it exactly in plain English," what I can see so far is that Park Factor gives him a significant upward adjustment relative to the guys who would seem to be the main contenders, as much as 12-13% compared to Ford and Lary, a bit less compared to Pizarro.

And, his small number of unearned runs gives him a benefit of....well it varies a lot, because the other guys' unearned runs vary a lot.
Compared to Ford, it's about an extra 3% bump.
Compared to Lary, it's about a 14% extra bump.

At least for the moment I think we can ignore the thing about the "runners on base" split, until Tom or anyone replies to what has been noted about that, because it appears not to be any significant thing. Kralick's relatively high WHIP and his unusually poor performance against leadoff hitters must have created an unusually high quantity of such situations. Doing well in those situations doesn't ameliorate whatever is shown by pure-runs numbers; it just keeps those numbers from being much worse.

Putting the Park Factor and unearned runs considerations together, it looks like that's about a 16% upward bump for Kralick in comparison to Ford, and about a 28% bump compared to Lary.

Does it look like that's enough to vault him ahead of them?

WELL, SORT OF YES IT DOES.

Tentatively (because I'm not capable of more than tentatively) I'm saying, in the plain English that I've been hankering for, it looks like it's very largely Park Factor and counting unearned runs equally with earned runs.

The reason I say only "sort of" is that it doesn't really account for Kralick being ahead of Ford but it puts it in the running.

These noted adjustments move Kralick about 3% ahead of Ford.
But Ford pitched 17% more innings, which, absent any other countering factors, puts Ford 11% ahead on overall value.

These noted adjustments move Kralick about 11.6% ahead of Lary.
Lary pitched 13.7% more innings.
Even though the second number is larger than the first, putting these together still leaves Lary behind a couple percent (really 2%) on overall value.

I think it would have helped for it to be stated right away, "It's very much Park Factor plus counting unearned runs equally with earned runs."

I think it's still answered as to what's the rest of what moves Kralick up.
Some things have been speculated, some things have been said generically (like, teams faced), but I don't think we know yet how or whether they apply to Kralick.
2:53 PM Jan 5th
 
tangotiger
rwarn:

Right.

This is no different than estimates of inflation rate or unemployment rate or anything else. We are not actually counting every month the employment status of 100 million adults. Nor are we doing the same for the untold millions of goods and services. There is not even a consensus as to WHAT goods and services to count, since each industry has its own view.

WAR is about estimating how much each player contributed. And so you need multiple perspectives.

Fangraphs offers one and BRef offers another. And where they differ, it indicates the kinds of assumptions each makes.
2:05 PM Jan 5th
 
MarisFan61
Tango: That page doesn't address any of the additional questions that have been raised.
Actually I don't see that it adds to what we'd had previously on this page.
1:53 PM Jan 5th
 
rwarn17588
These two examples -- Kralick and Cardwell -- show the flaws of even really good ratings systems. If WAR is deemed 99 percent accurate, that means you're going to have flubs once in a century or so.

And I doubt WAR is that good, though I really like it because it's a comprehensive look at a player.

I think once we consider the notion that no rating system is 100 percent accurate, then we will look at something with a somewhat skeptical eye and examine other ratings for a more complete picture -- which is probably what we should be doing in the first place.
1:42 PM Jan 5th
 
tangotiger
I had a long discussion on Twitter. I collected all my posts and they are in the comments section of my blog:

www.tangotiger.com/index.php/site/comments/runs-on-the-knights-watch
1:17 PM Jan 5th
 
MarisFan61
Yes, we'd accept it a lot more easily -- and, I say, there'd be a much higher chance that it would be well taken.
12:40 PM Jan 5th
 
Steven Goldleaf
I wonder how much of this questioning of Kralick and Cardwell derives straight out of the fact that they never became (or had been) dominating pitchers. What I mean is, if Koufax in 1959 or Spahn in 1965 were shown to have dominated the NL in those heretofore-undominating years, would we just accept it as being consistent with the rest of their careers, and would we easily agree to accept the "new" findings?
11:56 AM Jan 5th
 
tangotiger
Cy Young Tracker:
https://www.baseballmusings.com/cgi-bin/CyYoungTracker.py?EndDate=10/10/1961&SortField=((SS.FIP​/6.0)+-+SS.ER)+++(SS.Ks/10.0)+++SS.Wins&SortDir=desc

So using the 2018 Cy standards, Kralick is nowhere there. Ford stands alone.

Lary, O'Toole, Spahn, Bunning, Koufax would each get 2nd place votes.

If you sort by Season Score, so basically the standards of 20 years ago: Ford stands alone as well. Then Lary would be an easy second among starting pitchers.​
11:37 AM Jan 5th
 
ventboys
I think my intuition mostly matches the group, in that I don't think Kralick and Cardwell were, in fact, hidden gems so much as anomalies, but how much can I trust my intuition?

What percentage of intuition is simply confirmation bias?
11:07 AM Jan 5th
 
MarisFan61
Yes :-) but remember, even with the Park Factor adjustment, his "E.R.A.+" was just 15th in the league, and very far behind some of the other contenders.
10:55 AM Jan 5th
 
steveperry9
Mrs Kralick should thank God for Park Factor Adjustment . Otherwise He ( big benefit and Ford ( big penalty ) would not have even be close.

Of course I believe in Park Factors but sometimes it's impact seems a bit overstated . I'm sure the analysis is correct but sometimes correlations can just be spurious or ballplayers make smart adjustments to better fit their park- Should they then be statistically punished for using their heads ?
.
Guess we're all waiting for Tom Tango response .
10:22 AM Jan 5th
 
MarisFan61
Because of the length of that post, here's a Cliff's Notes:

-- I don't see how his good clutch split counters his significantly higher E.R.A.; it didn't keep him from giving up lots of clutch runs, because he put so many runners on base. It made him give up fewer runs than he might have; it didn't make him have a real good record of few-runs-given-up, either clutch or overall.

-- Separate issue at the bottom about the fielding behind him.
10:19 AM Jan 5th
 
jgf704
The link below shows the RAR_1 and RAR_2 for AL pitchers, along with the difference due to context, and due to unearned runs. The pitchers are ordered by RAR_2.

RAR_1 = IP/9 * (ERA_lg +1 - ERA) * (RA_lg/ERA_lg)
RAR_2 = IP/9 * (RA9avg + 1 - RA)

https://i.imgur.com/13HYeHA.png
10:19 AM Jan 5th
 
MarisFan61
A couple of things:

I also noticed the split mentioned by Tom -- also the unusually good "leverages" split -- and wondered if those might be big factors in the outcome.
But I figured they couldn't, because, still, look at how high his E.R.A. is.

(I don't mean "high high," just high in relation to the other pitchers we're looking at, even if we're adjusting for Park Factor and so we're looking at ERA+.
He was 15th in the league on that, with 117. Don Mossi, who pitched just about the same number of innings, was 139, but comes out with 2 full "WAR" fewer [4.0 vs. 6.0]. Bunning, with more innings, was 129. Frank Lary, with more innings was 127.....
BTW Ford was below Kralick, just below, but with many more innings.)

AND THERE'S MORE:
This next thing reminds of a funny line by Bill in one of the early Abstracts. He quoted some manual saying that Bill Buckner defines a tough out, and added, "Yes, and defines it very frequently."

Compared to the other pitchers we're talking about, he put loads of men on base.
He was particularly putrid with the leadoff batter.


In terms of trying to judge 'how good' he was, his good "clutch" performance helps, of course, but, to begin with, I have trouble seeing how it should undo the significantly higher ERA (and ERA+) than the other pitchers, and further, well great, he did well in clutch situations, but he created a lot of them by putting so many men on base. It doesn't mean he didn't give up lots of runs in clutch situations, just that he didn't give up as many as hie would have if not that he often rescued himself from worse.

Here's the league average for leadoff hitters:
.255/.324/.399

Here's Kralick's record that year with leadoff batters:
.325/..373/.524


And, not unrelated, his "WHIP" was much higher than most if not all of those other pitchers. He was terrible (TERRIBLE) against the 1st batter of innings, and, relative to these other guys, he put lots of guys on base.

So, great: He did well in the clutch situations. It made his ERA better than it might have been, and it won games. But it didn't make the overall picture great.

I don't mean to be suggesting that "record vs. leadoff hitter" should be part of the metric, or that "WHIP," directly or indirectly, should play a greater role.
All I'm saying is what I'm saying: The "clutch" stuff is nice, but in such a context, it's more like a sarcastic "whoppee" than a thing that adds up to any kind of feather in the cap.

-----------------------------------------

Separate thing:

I can't tell if it's been agreed that part of what elevates Kralick above what we might have expected is that the system believes he had poor fielding behind him.
It was mused about in the first comment.
Is it so?

I know that Kralick's very low number of unearned runs helps him.
Just wanted to note that if that other thing is true, these two things are in conflict. Not flat-out contradictory, but severely in conflict.
10:04 AM Jan 5th
 
DMBBHF
Follow up on my last post....

In applying that formula retroactively to 1961, Cardwell would have been 10th in the NL, and Kralick would have been 15th in the AL. Applied retroactively using today's standards, it implies neither one would have been a serious Cy Young award contender.

Thanks,
Dan
9:39 AM Jan 5th
 
DMBBHF
Interesting reading the discussion. Thanks, Tango, for all the insight.

Although most of the conversation has been around baseball-reference.com's version of WAR, there are 3 excerpts from Fangraph's library definition that I particularly like and I always try to keep in mind:

Wins Above Replacement (WAR) is an attempt by the sabermetric baseball community to summarize a player’s total contributions to their team in one statistic.

You should always use more than one metric at a time when evaluating players, but WAR is all-inclusive and provides a useful reference point for comparing players.

WAR is not meant to be a perfectly precise indicator of a player’s contribution, but rather an estimate of their value to date.


I think it's useful to to always keep those things in mind, because WAR can't really provide the definitive answer to the question of who is "best" or who deserves to win an award (at least not by itself). I think it's an important part of the conversation, but it's not the ultimate arbiter. To me, it would be like using net worth alone to determine who is the "most successful" person or using something like GPA or IQ alone to decide who is the "smartest" person. Those are all things to consider, but none of them stands by itself in trying to decide the final order of things.

WAR, whether it's fWAR or rWAR, strives to account for as much as possible into one easy-to-digest number, and it's very useful for a great many things, but it doesn't stand alone. At the end of the day, even though it accounts for many different things and is a wonderful tool for analysis, it's still just one piece of evidence.

Second point:

When Bill asked the question "So if we could re-do the Cy Young vote with the Jacob deGrom chorus in full voice, would Cardwell and Kralick come out on top? ", I think the answer is no, or at least, no for right now.

For one thing, I don't think Cy Young voters in particular pay all that much attention to WAR at this point. deGrom wasn't the leader in rWAR (although he was in fWAR). Aaron Nola had more rWAR that deGrom did, but finished third in the voting. deGrom, in my view, won the award because he had such a huge gap in ERA vs. his competition, and he had that long streak of starts allowing 3 or fewer runs, and all of that more than compensated for the lack of wins. It's not that pitcher wins don't count, it's more that there are other things to consider.

I really like Tango's Cy Young prediction formula that he has published: Cy Young Points = ?(IP/2 - ER) + SO/10 + W. That seems to be doing pretty well in predicting the outcomes and identifying the top contenders, and it doesn't incorporate WAR at all.

If we applied that formula retroactively to 1961 (which, I realize it's not really designed to, but still, I thought I'd see what it implied), it would have predicted Whitey Ford in the AL and Jim O'Toole or Warren Spahn in the NL (of course, there was only one combined MLB Cy Young award granted in 1961).

Maybe WAR will become a bigger part of the discussion in future Cy Young awards....maybe not. I think we're still at a point where voters prefer to look across a spectrum of categories rather than relying too much on the summary stat. In short, I think that's a healthy approach.

Thanks,
Dan



9:30 AM Jan 5th
 
tangotiger
Let me make a correction: Kralick is #1 in context-adjusted RAA at +37. His unadjusted RAA is +22 runs. He gains 15 runs for his context.

Spahn leads at +38 RAA unadjusted and is +13 adjusted, for a -25 runs for his context.

Is it possible that these are poor estimates? Sure. BRef is laying it out there though, and the big one is the park factor, with the next big one being fielding.

This is the purpose of what we do in sabermetrics. We try to isolate these contexts. Simply the idea that the context adjust is large is not, by itself, reason to distrust anything.
9:00 AM Jan 5th
 
tangotiger
This is what is perplexing the saber community when it comes to separating fielding from pitching: we can identify WHO is there, but we can't assign RESPONSIBILITY well enough. You start with simply ONE game. You have a perfect game, and so is 4 runs better than average and 5 runs better than replacement. But is the pitcher responsible for ALL of it? We've watched enough baseball to appreciate that there's alot of randomness. So, are perfect games usually 3 runs or 2 runs better than average for a pitcher? And are they 1 or 2 runs better than average for fielders?

So that randomness, while starts to wash away over a season, doesn't completely wash away.

Jack Kralick in 1961 has this split with bases empty and runners on,respectively:
.292/.341/.429
.253/.297/.358

The OPS of those number is 14% higher than league with bases empty and 22% lower than league with runners on. And the Leverage Index with runners on is 2x that of bases empty.

So you have a pitcher that is substantially better... correction... a pitcher who has been ASSIGNED a performance record substantially better when it counts the most. And this explains why, when he's on the mound, he has among the lowest RA/9 in the league.

Do we want to credit Kralick with being on the mound getting better results with men on base, thereby limiting the impact of guys who got on base?

In other words: do we care about sequencing?

Or, do we prefer a "seasonal component" ERA, one that ASSUMES all performance is random in terms of the base-out state?

This was in effect "clutch pitching". Or "clutch results". And if we are trying to account for 101 runs allowed, and not the 110 or 120 (or whatever it is) that randomness would expect, then someone has to absorb that good result.

And you either give it to Kralick and/or his fielders and/or create a "timing-Kralick" bucket that acknowledges there was some 10 or whatever runs that were earned "on the knight's watch", but we don't know what to do with it.

Bill's methods are all about accounting for all those runs. So, we have to account for them, somewhere.

***

Fangraphs takes a polar opposite view, and assumes randomness of events, and ONLY targetting BB, SO, HR, HBP of a pitcher. The rest are essentially assigned to fielders and/or timing.

***

The true answer is somewhere in-between and since I know that we'll never come to consensus, I simply take a 50/50 approach of rWAR and fWAR and call it a day.

My Game Score v2 is in fact (a simplification of) that middle ground.

8:16 AM Jan 5th
 
steve161
To say that these results are a problem with WAR is at best an oversimplification and at worst grossly misleading. It's a problem (if it is, and I think it is) with using RA/9 as a basis for pitching WAR. Fangraphs, as is well known, uses FIP, and comes up with a very different result.

In principle, I prefer RA/9 to FIP as a basis, because I don't buy the argument that pitchers are not responsible for what happens to batted balls that stay in the park. But using RA/9 requires making adjustments for ballpark and defense that FIP can largely ignore. Every one of those adjustments introduces another element of uncertainty--an error bar--into the calculation. Every now and again the uncertainty overwhelms the calculation. I expect this is true of every system, including Win Shares, and it's telling that Bill introduced a fudge factor into the Historical Abstract's player ratings.

The question is, does a result like this cast so much doubt on RA/9-based pitching WAR that it's necessary to work out major refinements to the system. I don't think so, because I believe that the defensive side of baseball is so complex that any system will be similarly flawed. I'd be interested in Tom's take, particularly applied to pre-Statcast player evaluations.
7:03 AM Jan 5th
 
MarisFan61
Nice work.
That makes progress toward teasing it out.

I don't understand the details beneath some important parts, like RA9avg being "pitcher dependent," but I'll be looking closer at those things.
And, trying to convert the main points you all are getting at into plain English. :-)

Like: he gains this much on so-and-so factor compared to these other pitchers, and this other amount on this other factor compared to them.

JGF does it for Kralick and Ford on that one thing.
BTW there are things we can look at within the pitchers' game logs to see if there might be any practical aspects that make those numbers misleading.

---------

I'd love it for anyone to say if they think they've ever seen such a seemingly out-of-the-way result from Win Shares on such a prominent thing. I don't recall that I ever have.
2:06 AM Jan 5th
 
jgf704
While Kralick's 4 unearned runs are certainly an advantage, I'd say an equally important advantage is "context" (as Tango termed it). That is, the column that BB-Ref labels "RA9avg".

Following Tango's approach, one quick and dirty RAR (runs above replacement) is

RAR_1 = IP/9 * (ERA_lg + 1 - ERA)/.9

This assumes a replacement level is ERA_lg+1, and 10% of runs are unearned (the factor of 1.0 - 0.1 = 0.9 in the denominator).

But a q+d version of the BB-Ref RAR is

RAR_2 = IP/9 * (RA9avg+1 - RA)

where RA is "run average", i.e. runs allowed per 9 innings.

(of course, both could be converted to wins by dividing by 10)

Yes, the BB-Ref version uses runs directly (i.e. giving up a bunch of unearned runs doesn't matter). But, more importantly, they use "RA9avg", which is pitcher dependent. Per BB-Ref, it appears to account for the actual teams faced, as well as the ballparks the pitcher appeared in.

Consider Whitey Ford vs. Kralick. Using the first q+d (i.e. with ERA and ERA_lg), Whitey beats Kralick 63 to 42.

But using the second q+d, Kralick wins, 62 to 43, an overall swing of 39 runs in Kralick's favor. And most of this is in the difference between pitcher and league runs allowed, i.e.

IP/9 * (RA9avg - RA_lg)

For Kralick, with his RA9avg of 5.08, this quantity is +13, whereas for Whitey, with RA9avg=3.86, this quantity is -23, so an overall swing of 36 in Kralick's favor.
10:30 PM Jan 4th
 
MarisFan61
.....Re Tom's sort of P.S., "You can NOT start picking and choosing which answers you like":

I think you're missing a big point. I don't think anyone's talking about picking and choosing which answers we like.

We're talking about what it means when a method gives odd results.​
10:19 PM Jan 4th
 
MarisFan61
So: It's basically the "fielding adjustment"?
(Yes?)

It's that Kralick's "RA/9" gets improved very substantially, compared to most of those other pitchers (not necessarily Pascual) from what it would be if it were just pure 'runs-allowed-per-9-innings' because of poor fielding behind him?
(Yes?)

OK.

If so:
That leaves the rest:

Per what I said before, could you comment on the finding, as to what it means, in your judgment?
Do you think these 1961 results are likely to be truly meaningful; and if not, do you think they're just stray aberrations or that they perhaps point up basic flaws in the "WAR" system, and if so, what might those be?
10:15 PM Jan 4th
 
tangotiger
You can NOT start picking and choosing which answers you like.
9:27 PM Jan 4th
 
tangotiger
Those commenting by reading Bill's article and not my twitter link: please read the twitter link.

***

Ventboys has the perfect reaction:
"
If Kralic is the name spit out according their methodology, so be it. Cardwell? Why not? If you call BS on those, after buying Grich and the various Evans's and Whitaker and selling Morris, Garvey and Dawson, you have to at least ask yourself what your own methodology is.
...
The reasoning seems sound to me, in its own way, and I think it's awesome that a conclusion as wild as this can be reasonably defended, if not easily accepted.
"

BRef is laying it all out there. You can see where everything is coming from. You can start picking and choosing which answers you like.

However: You can even decide "you know, I don't like the fielding adjustment, so I'll back that out". It's presented in a way where you CAN do this. So go ahead and do that. Count everyone as 0 for fielding adjustment.

The most important thing though is the basis, and BBRef went with RA/9 (not ERA). And so, having 4 unearned runs or 17 unearned runs meaning nothing. If he gave up 100 runs, he gave up 100 runs. That's the starting point. If you can't start there, then do NOT use WAR from baseball reference.

And since Kralick has the best combination of RA/9 and IP, the starting point is that he is #1.



9:26 PM Jan 4th
 
evanecurb
I can’t explain it, but there are several possibilities as to Kralick and Cardwell showing as number one on one site but not others:
1. Intangibles
2. Guts
3. Grit
4. Or it could be fake news planted by Russian hackers.
5. Racism and Bias: how else to explain Kralick and Cardwell over Pascual, O’Toole, and Ford (Catholics), Koufax (Jewish) and Brosnan (a writer/intellectual/liberal)

I’m sure it’s one of these. You don’t have to thank me. I’m happy to help.
8:58 PM Jan 4th
 
Manushfan
Kralick...Kralick. Donno him. Cardwell rings a bell sorta. But this-this is silly. C'mon guys. This is like saying Ross Grimsley was the best AL pitcher in '74. You can bend and twist the stats all you wanna, but it don't work. I can see it if the team goes 64-98 and the guy's the league leader in ERA or has a scarey good K-W rate, or whatever-but these guys aren't doing that. They're....pitching like Darren Oliver. I'm confused. But you know that already. I'll defer to Bill on this one.
7:34 PM Jan 4th
 
MarisFan61
Ventboys: re " I think it's awesome that a conclusion as wild as this can be reasonably defended" -- You think it has been?

Please explain it to me in plain English. :-)
I don't see that the material in the provided link says it or implies it. I see numbers, none of them extreme (which I think we'd need one or another of them to be), and without reference or comparison to what those numbers are for the pitchers that we'd expect to be up there.

Let me put it this way:
If it were Bill explaining/defending a highly counterintuitive Win Shares finding, I think we'd very likely see all of that, plus (important!) at least a small plain-English thing about what's the key aspect of the system and of the player that produced the odd result, AND (important!) whether he believes it, as opposed to whether he doesn't find it believable, in which case he'd probably indicate something about how this reflects a thing about the system that's not what it might be.

The extra stuff I was going to do was to get together the Win Share rankings for the 1961 pitchers. I see on Reader Posts that our member BarryBondsFan25 has already posted the rankings, apparently gotten from our site's "stat depository," done by our member Studes.

Here they are:

Luis Arroyo 23
Whitey Ford 22
Frank Lary 22
Don Mossi 20
Steve Barber 19
Jim Bunning 19
Juan Pizarro 19*
Jack Kralick 17
Ken McBride 17
Bill Monbouquette 17
Camilo Pascual 17
Bill Stafford 17

* (I was going to get my data from the Win Shares book.
I see that all of the above numbers match what's in the book except that the book shows 18 for Pizarro.)


I think we'd all agree that this is a more rational-seeming list than the WAR list. I don't mean that this necessarily means it's more 'right,' but I think it's safe to say that usually -- the majority of the time -- a more rational-looking list on something like this is more likely to be right.

I don't recall ever seeing any list of Win Share rankings on anything that's as irrational-looking as the WAR list for 1961's pitchers.
Does anyone?

In line with what Ventboys said -- "If Kralick is the name spit out according their methodology, so be it. Cardwell? Why not? If you call BS on those, after buying Grich and the various Evans's and Whitaker and selling Morris, Garvey and Dawson, you have to at least ask yourself what your own methodology is" -- I take such findings as more than just stray errors. I gotta think they reflect flaws of the system.

I'd love to know if Tom and/or any other strong advocates of "WAR" think that these 1961 results are likely to be truly meaningful; and if not, whether they're just stray aberrations or whether they point up basic flaws.
(All of that is in addition to still being at a loss about what it was about Kralick year, "in plain English," that vaults him ahead of the others.)

Thank you for listening. :-)
7:23 PM Jan 4th
 
FrankD
Raincheck ..... like your comment. Except I would add that main question is not how often the analytics are wrong, by why are the analytics wrong in this case? The determination of how often these analytics are wrong would be a help in evaluation of the worth of analytics . Understanding where the analytics fail will lead us to better analytic formulations. Kind of a Bayesian analysis.

for all:
Of course, '61 isn't 2019 and we now have enough data to not make huge analytic mistakes, right???? Just a reminder for all of us who sometimes think we have the answer, and I've been wrong many times doing analytics in the oil business and stock market ....... .
6:47 PM Jan 4th
 
raincheck
Well, we make these imperfect adjustments to the historical record, and in general they provide a better understanding of a player’s true value.

Then we stack them on top of each other. A park effect estimate. An attempt to measure a team’s defense in 1961. Etc.

It is inevitable that will be cases where these adjustments distort the record more than they clarify it. The question is, how often?

This is two cases where it did.
6:27 PM Jan 4th
 
FrankD
Jack Kralick, Camilo Pascual best in 1961? ..... this begs the question of why these two seem better in statistical analysis than most would glean from their basic ERA, Wins, Losses. Was there something unique about the Twins and their Stadium in '61? Maybe the weather? I dunno, but its very suspicious that two 'outliers' are on the same team. I do applaud that some will stick with their analysis to the end. But sometimes you have to look and say, what is wrong here. Would you have spent millions on the (pretend) free agents of 1962: Kralick and Pascual? Did their future performance trend with '61 analytics ranking? It may be that '61 Twinkies do not fit the standard analysis for some reasons. And the '61 Twinks weren't a bad team. Soon to set a non-expansion HR team record and to contend through the whole decade ....
6:22 PM Jan 4th
 
chuck
In BB-Reference’s Player Value section, as Tom says, they break down how they estimate what the average pitcher’s R/9 would be, given pitcher X’s context (opponents, defensive support, and park). Another factor is also used to separate starters from relievers.

For defensive support, Kralick is estimated to have received .09 runs per 9 below average (-.09). Pascual’s number here is -.08 R/9. These defensive support estimates come from BBR’s team defensive runs above or below average, and, if I understand right, are assigned to each pitcher largely on the basis of their individual percentage of balls in play.

As has been mentioned, though, Kralick had just 4 unearned runs that season, compared to Pascual (17). The league ratio of runs to earned runs was 1.138. For Pascual it was 1.175 and for Kralick just 1.04. That means Pascual got about 4 runs below average, in terms of defensive support related to that R/ER ratio, and Kralick got about 9 runs of support ABOVE average. One might fine tune this, perhaps, by counting unearned runs as earned if the pitcher himself made the error.

Anyway, that’s a difference between them of about 13 runs, which is a fairly large number when one then figures it as a runs per 9 of support. It’s about .47 runs per 9 innings of difference, given 250 innings. Kralick’s number alone would be worth +.33 R/9, if one was using that R/ER ratio way of looking at it. BBR doesn’t try to parse out who got more defensive support so much as cut up a defensive support pie. I would think that using unearned runs (or their absence) might be a possible way to assign further such support, rather than basically give everyone about the same number. Another way to divide the pie. Of course, we don’t know from the stats which hits were line drive no-doubters and which were ground singles untouched due to poor range, so perhaps Kralick, who gave up many hits, did have poor support in terms of his fielders not getting to balls. But we do know that this or that specific run was UNearned, due to an error, so why not incorporate these into defensive support?

Adjusting Kralick’s -.09 R/9 of defensive support by +.33 would, at the very least, shave around a win off his WAR.
5:41 PM Jan 4th
 
ventboys
Bill's response reminds me of Peter Gammons, wondering why Bill would "predict" a rookie to win a batting title.
4:26 PM Jan 4th
 
ventboys
I don't know if it's correct or not, but I salute BBR for going with their results. I rail about WAR as much as anyone, but not because I think it's flawed so much as I don't think its more enthusiastic advocates understand that it is an approximation of a reflection rather than a profoud truth, written on a tablet and tossed off a mountain.

If Kralic is the name spit out according their methodology, so be it. Cardwell? Why not? If you call BS on those, after buying Grich and the various Evans's and Whitaker and selling Morris, Garvey and Dawson, you have to at least ask yourself what your own methodology is.

I think Bill's response is the correct, one, if it matters what I think. He is skeptical, but not simply dismissive. He did not tell Tango he was wrong, he asked Tango to explain it so he could understand.

The reasoning seems sound to me, in its own way, and I think it's awesome that a conclusion as wild as this can be reasonably defended, if not easily accepted. It means we still don't know everything. The day we know everything will be a sad, sad day.


4:20 PM Jan 4th
 
MarisFan61
Michael: Thanks for the reply -- and I have to say that it sounds like while you aren't putting it anything like how I did, you're at a loss too.

Perhaps the difference in how we put it, besides that maybe you're a nicer guy than I am :-) (although I'm really pretty nice) is that you have a greater basic faith in the "WAR" system, and so you're willing and happy to take it on faith that there's a reasonable basis for how Kralick comes out, AND (most relevantly for our point of departure) that you think the link provided by Tom does indeed elucidate this meaningfully.

I don't think it does.

I'll be saying more about this later, including how I think it all reflects on the "WAR" system in general, but before I do that, I'll want to get together some info on how Win Shares sees these 1961 pitchers. (Don't have the materials right now that I'd need for it, and I don't want to just use Baseball Gauge's data for this.)
3:45 PM Jan 4th
 
MWeddell
Maris:

I've not looked into it in detail, although I was about to post that having only 4 unearned runs helps a little, before I saw that Tangotiger (indirectly) pointed that out. If I spend the time to look at the bb-refWAR calculation for him, I'll repost here. There are some intermediate results posted in baseball-reference, not just the final number, that may (or may not!) answer this question.

-- Michael
1:12 PM Jan 4th
 
MarisFan61
(MWeddell: If you really get it, maybe you can help out with a thing or two....

In a few words or even more than few, in plain English, what is it [or are it] that elevates him over the others?
And, I'd think at least one such thing would need to be real big. As yet I don't see any such.
I don't doubt it's there somewhere.)
11:40 AM Jan 4th
 
MarisFan61
Tom: I'm not sure that helps much. (Doesn't for me.)
It tells some technical stuff but doesn't (as near as I can tell) get into why those things would have given Kralick such a great advantage over the other who 'appear to be' much better that he'd even pull even with them, to say nothing of pull ahead.

I think you'd need to cite things about Kralick together with what are those things about at least a couple of the others.
11:35 AM Jan 4th
 
MWeddell
Thank you, Tangotiger (Tom Tango). For anyone else having difficulty following his link, just Google "tangotiger twitter" and look for several tweets from today on the topic.
11:32 AM Jan 4th
 
tangotiger
If you click the link, it loses the colon ( : ) after https for some reason.
10:30 AM Jan 4th
 
tangotiger
I posted the reasoning on Twitter:

https://twitter.com/tangotiger/status/1081221095393579009


10:26 AM Jan 4th
 
bjames
Translating those numbers into WAR, I get Ford as the #1 pitcher of the season at 7.2, followed by Koufax (6.9), Bunning (6.6), Pascual (6.6), O'Toole (6.2), Lary (6.0), Pizarro (6.0), Spahn (5.9), and Steve Barber (5.9). I have Cardwell at 4.7 and Kralick at 4.4. ..not that those are bad numbers; 4-5 WAR is pretty good.
8:50 AM Jan 4th
 
bjames
For what it is worth. . .I'll explain all of this later,but for what it is worth on the surface of it. . .looking at every start and adjusting for the quality of competition and the park, I have Ford as deserving of a 19-12 record in 1961, whereas I have Bunning at 18-11, Koufax at 17-10, Pascual at 16-9. But I have Kralick at 14-11, Cardwell at 15-13. Frank Lary is 16-11, Juan Pizarro at 14-6.
8:38 AM Jan 4th
 
bjames
I actually buy that Pascual was one of the 2-3 best pitchers in the league, despite a 15-16 record and a 3.46 ERA. I think he really was. But Kralick. . . .not buyin' it.
7:40 AM Jan 4th
 
StatsGuru
Could it be an artifact of expansion? Expansion tends to skew the distribution of talent in the majors.​
5:18 AM Jan 4th
 
stevebogus
BB-Ref has Kralick's teammate Camilo Pascual as the second best pitcher in the AL. Whitey Ford doesn't make the top ten. I think this is an example of the way things can go wrong when you try to equalize/normalize everything based on limited data (a few seasons for park effects, a single season for fielding).

1961 was Ralph Houk's first season as manager and he worked Ford harder than Casey ever did, 39 starts including 27 times on 3 days rest.

In Ford's four losses the Yankees scored fewer than 3 runs. When given three or more runs Whitey was 25-0. Ford's ERA was 2.45 in his wins, 2.63 in his losses, and 5.93 in his ten no-decision games. The Yankees pulled out wins in 9 of those, which is why Ford only had 4 losses. He probably should have been 25-9 or so.


4:20 AM Jan 4th
 
FreeKresge
In the National League, it looks like Baseball Reference thinks that RoelTorres is correct. It sees Caldwell and Fangraphs WAR champion Koufax as being similar in most respects (with Koufax pitching in a stadium that was a bit more hitter friendly) except that Koufax had merely a bad defense behind him (-0.20 runs/9 IP) whereas Caldwell had a horrid defense behind him (-0.46 runs/9 IP).

I can see why Baseball Reference (and Fangraphs) believes that Spahn was not one of the best pitchers that year. Besides having a great defense behind him, his home/away splits are as follows:

Home: 2.03 ERA in 168 1/3 IP
Away: 4.77 ERA in 94 1/3 IP

In other words, Spahn benefited from pitching over 64% of his innings in a very pitcher-friendly park.

In the American League, defense appeared to play a role as well with Baseball Reference seeing Fangraphs WAR champion Bunning as having a very good defense behind him (+0.35 runs/9 IP) and Kralick as having a below average defense behind him (-0.09 runs/9 IP). It also sees Kralick as pitching against tougher opposition in a more hitter-friendly park.

Ford, meanwhile, never had to face Mantle or Maris, pitched in a very pitcher-friendly park (albeit with splits not nearly as dramatic as Spahn's), and had a great defense behind him.

Personally, I think that Fangraphs passes the eyeball test better than Baseball Reference does, at least for 1961. For example, it seems to understand that Koufax compensated for his poor defense by not letting batters hit the ball in the first place.

12:22 AM Jan 4th
 
sayhey
Was curious how they fared on Fangraphs (combined leaderboard for both leagues): Kralick, 5.1 (7th), Cardwell, 4.3 (12th). So they still do reasonably well. Bunning and Koufax are tied at #1 with 6.3.​
11:25 PM Jan 3rd
 
MarisFan61
I remember Kralick as part of the first pitching staff that struck me as having a gajillion real legit starters -- for example, pitchers any of which could conceivably wind up on the all star team. I still remember them:
Tiant, Terry, McDowell, Kralick, Siebert, Stange (Cleveland, 1965)
(How many of them did make the all star team: just 1.)
I decided it meant they'd win the pennant. They were in the running most of the year but wound up not close.
Kralick, as it turned out, was the only one of the 6 who didn't have a decent year, but for me he's forever a member of this, um, historic staff.
Things that feel historic for us are sometimes more personal than sane.

-----------------

I think someone ought to mention (OK, I volunteer) that unless I'm really missing the boat, Bill is being sarcastic -- he's saying that there is no easily imaginable or unimaginable way to understand what it could be about baseball-ref.com's "WAR" system that makes Kralick come out as #1 in the league. Whether or not that's what Bill means, I mean it. It's beyond a head-scratcher.

Is it that Minnesota had an unusually extreme hitter-friendly Park Factor?
No, it didn't.
(a little, but that's all)

Did he nevertheless somehow have an unusually impressive "E.R.A.+"?
No, he didn't.
(good, but not beyond good)

Could it have been something like what Roel says?
Maybe, I guess, but I didn't think "WAR" took stuff like that into account in crunching the numbers.

....not that I can think of anything else, though, so.....

But, let's see, do they seem like an exceedingly awful defensive team, from the list of regulars?
I don't think so. Not real good but not awful -- and, for what it's worth, the team's overall "dWAR" isn't awful; yes, bad, but not worse in the league, not awful.

It's a mixtery. :-)
10:20 PM Jan 3rd
 
FrankD
Bill, thanks for mentioning Jack Kralick. As a Twins fan I would often see some write-up mentioning Kralick as first Twins pitcher to pitch no-hitter. At age 5 I was too young to remember the game. I cannot see how anybody would pick Kralick as best pitcher in AL in 1961. Maybe he gets extra credit for having a lot of Ks in his name ......
10:11 PM Jan 3rd
 
rwarn17588
I love WAR as a comparison tool, but Lord ... this doesn't pass the smell test at all. There are at least a dozen pitchers in the AL and NL on those lists that are indisputably better than Karlick or Cardwell.

There sometimes are these things called anomalies or statistical noise. This seems to be both.
9:57 PM Jan 3rd
 
RoelTorres
Taking a stab in the dark -- Is it possible that Kralick and Cardwell played in front of terrible defenses and that they pitched really well, but received no defensive support and a lot of balls that should have been caught would fall in, and every ground ball would get through? This might help explain why a guy could pitch better than anyone, but have the stats look surprisingly mundane.
9:29 PM Jan 3rd
 
 
©2019 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy