After the article about Pitcher Consistency I had a question about the relationship between Game Scores and Won-Lost records. I’ve written that up before, of course, but I can research the issue better now than I could before. Let’s eliminate the levels of Game Scores where we have less than 100 Games to work with.
In the data from 1952 through 2011 we have 108 games with Game Scores of 93. In those 108 games the won-lost record of the starting pitchers (who had Game Scores of 93) is 96-1, and their ERA is 0.06. The won-lost record of their teams is 98-10, a .907 percentage:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
93
|
108
|
96
|
1
|
.990
|
98
|
10
|
.907
|
0.06
|
92
|
162
|
143
|
3
|
.979
|
151
|
11
|
.932
|
0.11
|
91
|
218
|
198
|
4
|
.980
|
204
|
14
|
.936
|
0.14
|
90
|
279
|
261
|
2
|
.992
|
267
|
12
|
.957
|
0.08
|
The ERA and the Winning Percentage of the Starting Pitcher change only a little with Game Scores in the eighties:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
89
|
375
|
346
|
5
|
.986
|
358
|
17
|
.955
|
0.12
|
88
|
418
|
383
|
4
|
.990
|
399
|
19
|
.955
|
0.12
|
87
|
559
|
519
|
7
|
.987
|
533
|
25
|
.955
|
0.15
|
86
|
701
|
645
|
7
|
.989
|
668
|
33
|
.953
|
0.15
|
85
|
895
|
834
|
13
|
.985
|
855
|
40
|
.955
|
0.17
|
84
|
964
|
885
|
20
|
.978
|
912
|
51
|
.947
|
0.22
|
83
|
1150
|
1065
|
24
|
.978
|
1094
|
56
|
.951
|
0.30
|
82
|
1365
|
1226
|
34
|
.973
|
1280
|
85
|
.938
|
0.33
|
81
|
1446
|
1308
|
47
|
.965
|
1352
|
93
|
.936
|
0.41
|
80
|
1563
|
1391
|
51
|
.965
|
1446
|
117
|
.925
|
0.44
|
In the 70s, starting pitcher winning percentage goes down by about 10 points for each point on the Game Score. A Game Score of 73 is more or less equivalent to an ERA of 1.00; a Game Score of 74 is more or less equivalent to a .900 winning percentage.
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
79
|
1718
|
1503
|
66
|
.958
|
1557
|
161
|
.906
|
0.52
|
78
|
1953
|
1678
|
99
|
.944
|
1754
|
198
|
.899
|
0.57
|
77
|
2053
|
1733
|
98
|
.946
|
1828
|
223
|
.891
|
0.67
|
76
|
2248
|
1879
|
127
|
.937
|
1983
|
265
|
.882
|
0.76
|
75
|
2403
|
1949
|
191
|
.911
|
2079
|
323
|
.866
|
0.84
|
74
|
2552
|
1967
|
228
|
.896
|
2134
|
418
|
.836
|
0.91
|
73
|
2692
|
2052
|
272
|
.883
|
2236
|
456
|
.831
|
0.99
|
72
|
3045
|
2244
|
321
|
.875
|
2448
|
594
|
.805
|
1.07
|
71
|
3083
|
2231
|
335
|
.869
|
2451
|
631
|
.795
|
1.14
|
70
|
3248
|
2258
|
384
|
.855
|
2558
|
690
|
.788
|
1.22
|
Going from 70 to 60 adds a run a game to the ERA. A Game Score of 61 is good enough to win half of your starts; a Game Score of 60 isn’t. You’ll win most of your decisions at 60, but a little less than half of your starts:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
69
|
3502
|
2379
|
461
|
.838
|
2662
|
839
|
.760
|
1.33
|
68
|
3626
|
2342
|
559
|
.807
|
2658
|
964
|
.734
|
1.43
|
67
|
3867
|
2482
|
579
|
.811
|
2868
|
996
|
.742
|
1.53
|
66
|
3973
|
2477
|
613
|
.802
|
2898
|
1074
|
.730
|
1.63
|
65
|
4181
|
2498
|
736
|
.772
|
2925
|
1254
|
.700
|
1.75
|
64
|
4300
|
2429
|
846
|
.742
|
2884
|
1412
|
.671
|
1.86
|
63
|
4495
|
2557
|
845
|
.752
|
3074
|
1419
|
.684
|
1.98
|
62
|
4498
|
2319
|
955
|
.708
|
2867
|
1630
|
.638
|
2.05
|
61
|
4629
|
2335
|
1038
|
.692
|
2912
|
1712
|
.630
|
2.22
|
60
|
4646
|
2281
|
1074
|
.680
|
2891
|
1751
|
.623
|
2.35
|
If you ask the question "What is the most common Game Score for the Winning Pitcher in a Game?", the answer is "65", but the graph is very flat. There are 2000+ wins (in the data) for every Game Score from 56 to 73, but with a peak of less than 2500. And, as we go from 60 to 50, we go from good to average:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
59
|
4896
|
2295
|
1169
|
.663
|
2998
|
1895
|
.613
|
2.49
|
58
|
4794
|
2148
|
1168
|
.648
|
2856
|
1937
|
.596
|
2.62
|
57
|
4999
|
2158
|
1335
|
.618
|
2885
|
2109
|
.578
|
2.76
|
56
|
5007
|
2067
|
1382
|
.599
|
2807
|
2198
|
.561
|
2.93
|
55
|
5050
|
1947
|
1477
|
.569
|
2717
|
2324
|
.539
|
3.04
|
54
|
5004
|
1899
|
1453
|
.567
|
2689
|
2312
|
.538
|
3.19
|
53
|
4889
|
1757
|
1503
|
.539
|
2583
|
2300
|
.529
|
3.40
|
52
|
5128
|
1761
|
1663
|
.514
|
2626
|
2496
|
.513
|
3.53
|
51
|
5035
|
1646
|
1645
|
.500
|
2566
|
2469
|
.510
|
3.75
|
50
|
4868
|
1484
|
1609
|
.480
|
2407
|
2455
|
.495
|
3.91
|
A Game Score of 51 will actually get the starting pitcher a winning percentage of .500. A Game Score of 50 will get the team a winning percentage of .500. It would be cool if a Game Score of 56 led to a .560 winning percentage, etc., but it doesn’t quite work that way. The winning percentages diverge away from .500 a little more quickly than the Game Scores move away from 50, and then the winning percentages flatten out, above 70/.700, so that the Game Scores can catch up. A Game Score of 50 is a 3.91 ERA. Below 50, the starting pitcher’s winning percentage drops like a bucket of paint:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
49
|
4771
|
1369
|
1734
|
.441
|
2211
|
2557
|
.464
|
4.10
|
48
|
4668
|
1281
|
1769
|
.420
|
2128
|
2536
|
.456
|
4.32
|
47
|
4546
|
1105
|
1839
|
.375
|
1941
|
2602
|
.427
|
4.51
|
46
|
4503
|
1082
|
1823
|
.372
|
1918
|
2581
|
.426
|
4.72
|
45
|
4278
|
959
|
1819
|
.345
|
1779
|
2495
|
.416
|
4.95
|
44
|
4355
|
918
|
1929
|
.322
|
1722
|
2630
|
.396
|
5.15
|
43
|
4267
|
831
|
1927
|
.301
|
1643
|
2624
|
.385
|
5.41
|
42
|
4142
|
747
|
1912
|
.281
|
1508
|
2634
|
.364
|
5.67
|
41
|
4110
|
633
|
2035
|
.237
|
1456
|
2653
|
.354
|
5.89
|
40
|
3940
|
557
|
1953
|
.222
|
1327
|
2611
|
.337
|
6.22
|
So the starting pitcher’s ERA goes from 2.35 to 6.22 between 40 and 60. A Game Score of 45, you lose two-thirds of your games and have a 5.00 ERA. The most common Game Score for a losing pitcher is 34:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
39
|
3921
|
530
|
1987
|
.211
|
1284
|
2634
|
.328
|
6.49
|
38
|
3900
|
468
|
2026
|
.188
|
1265
|
2631
|
.325
|
6.78
|
37
|
3810
|
359
|
2101
|
.146
|
1140
|
2668
|
.299
|
7.15
|
36
|
3585
|
319
|
2046
|
.135
|
1015
|
2565
|
.284
|
7.47
|
35
|
3697
|
266
|
2126
|
.111
|
1022
|
2675
|
.276
|
7.94
|
34
|
3554
|
210
|
2184
|
.088
|
890
|
2663
|
.250
|
8.27
|
33
|
3348
|
186
|
2038
|
.084
|
801
|
2545
|
.239
|
8.87
|
32
|
3339
|
148
|
2107
|
.066
|
763
|
2576
|
.229
|
9.28
|
31
|
3317
|
156
|
2168
|
.067
|
747
|
2567
|
.225
|
9.70
|
30
|
3168
|
111
|
2064
|
.051
|
670
|
2496
|
.212
|
10.28
|
At 30, your ERA is 10.00 and your winning percentage is .050. You wouldn’t think those numbers could get worse, but they do:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
29
|
2952
|
101
|
2005
|
.048
|
585
|
2367
|
.198
|
10.83
|
28
|
2869
|
94
|
1979
|
.045
|
553
|
2315
|
.193
|
11.41
|
27
|
2701
|
57
|
1898
|
.029
|
502
|
2197
|
.186
|
11.93
|
26
|
2630
|
49
|
1890
|
.025
|
473
|
2155
|
.180
|
12.60
|
25
|
2339
|
26
|
1732
|
.015
|
374
|
1965
|
.160
|
13.25
|
24
|
2095
|
36
|
1547
|
.023
|
331
|
1764
|
.158
|
13.63
|
23
|
1994
|
25
|
1520
|
.016
|
279
|
1714
|
.140
|
14.50
|
22
|
1775
|
17
|
1336
|
.013
|
281
|
1494
|
.158
|
15.23
|
21
|
1616
|
13
|
1295
|
.010
|
199
|
1416
|
.123
|
15.65
|
20
|
1439
|
14
|
1164
|
.012
|
171
|
1267
|
.119
|
16.38
|
At this point, the wins are just flukes—games in which both starting pitchers are hammered—so the data is unstable. The one pitcher who was credited with a Win with a Game Score of 11 was Russ Meyer of the Dodgers on September 9, 1955. Meyer pitched 7 innings against the Cubs, gave up 16 hits, 8 runs all earned, walked 2 and struck out no one, but was credited with a win in a 16-9 victory, matched up against Toothpick Sam in the second game of a double header. Below 11, we have no wins at all for Starting Pitchers:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
19
|
1184
|
8
|
954
|
.008
|
147
|
1037
|
.124
|
16.47
|
18
|
1100
|
4
|
922
|
.004
|
113
|
987
|
.103
|
17.09
|
17
|
910
|
7
|
756
|
.009
|
86
|
824
|
.095
|
18.20
|
16
|
808
|
3
|
692
|
.004
|
66
|
742
|
.082
|
18.77
|
15
|
696
|
4
|
596
|
.007
|
57
|
639
|
.082
|
19.79
|
14
|
607
|
3
|
520
|
.006
|
54
|
553
|
.089
|
19.83
|
13
|
507
|
1
|
440
|
.002
|
36
|
471
|
.071
|
20.38
|
12
|
443
|
2
|
390
|
.005
|
28
|
415
|
.063
|
21.51
|
11
|
375
|
1
|
334
|
.003
|
29
|
346
|
.077
|
22.38
|
10
|
288
|
0
|
255
|
.000
|
19
|
269
|
.066
|
22.96
|
And we run out of sufficient data at a Game Score of 5:
SP Game Score
|
Count
|
SP Win
|
SP Loss
|
SP WPct
|
Tm Win
|
Tm Loss
|
SP WPct
|
ERA
|
9
|
231
|
0
|
201
|
.000
|
17
|
214
|
.074
|
23.90
|
8
|
203
|
0
|
186
|
.000
|
12
|
191
|
.059
|
23.04
|
7
|
181
|
0
|
154
|
.000
|
13
|
168
|
.072
|
24.45
|
6
|
133
|
0
|
124
|
.000
|
3
|
130
|
.023
|
26.16
|
5
|
103
|
0
|
95
|
.000
|
6
|
97
|
.058
|
26.49
|
This chart puts all the winning percentages in one place, so you can print it out and carry it in your wallet, make a wall chart out of it or something:
Starting Pitcher Winning Percentage, By Game Score
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
0--
|
.000
|
.000
|
.000
|
.000
|
.000
|
.000
|
.000
|
.000
|
.000
|
.000
|
1--
|
.001
|
.002
|
.003
|
.004
|
.005
|
.006
|
.007
|
.008
|
.009
|
.010
|
2--
|
.012
|
.011
|
.013
|
.017
|
.020
|
.022
|
.027
|
.031
|
.040
|
.048
|
3--
|
.053
|
.059
|
.070
|
.081
|
.098
|
.115
|
.136
|
.151
|
.180
|
.208
|
4--
|
.224
|
.242
|
.279
|
.300
|
.322
|
.345
|
.371
|
.381
|
.420
|
.442
|
5--
|
.476
|
.497
|
.514
|
.538
|
.564
|
.572
|
.599
|
.618
|
.645
|
.661
|
6--
|
.679
|
.694
|
.711
|
.731
|
.751
|
.772
|
.790
|
.808
|
.822
|
.836
|
7--
|
.852
|
.867
|
.875
|
.885
|
.898
|
.911
|
.924
|
.936
|
.944
|
.956
|
8--
|
.963
|
.965
|
.972
|
.977
|
.978
|
.983
|
.984
|
.985
|
.986
|
.987
|
9--
|
.988
|
.989
|
.990
|
.991
|
.992
|
.994
|
.995
|
.997
|
.998
|
.999
|
10--
|
1.000
|
|
|
|
|
|
|
|
|
|
I grouped the data to even out the gaps a little bit. … .a "58" has yielded a .648 winning percentage in the actual data, but is shown at .645 in this chart. And here’s the same chart, for ERA:
ERA, By Game Scores
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
0--
|
27.92
|
27.55
|
26.40
|
26.17
|
25.93
|
25.60
|
25.18
|
24.50
|
23.82
|
23.46
|
1--
|
22.86
|
22.25
|
21.52
|
20.52
|
20.03
|
19.65
|
18.73
|
18.16
|
17.22
|
16.59
|
2--
|
16.33
|
15.62
|
15.16
|
14.49
|
13.70
|
13.24
|
12.60
|
11.95
|
11.42
|
10.84
|
3--
|
10.30
|
9.74
|
9.29
|
8.86
|
8.31
|
7.95
|
7.50
|
7.16
|
6.80
|
6.50
|
4--
|
6.22
|
5.91
|
5.67
|
5.42
|
5.16
|
4.95
|
4.73
|
4.52
|
4.32
|
4.11
|
5--
|
3.92
|
3.75
|
3.54
|
3.40
|
3.21
|
3.05
|
2.93
|
2.76
|
2.62
|
2.49
|
6--
|
2.35
|
2.22
|
2.07
|
1.98
|
1.86
|
1.75
|
1.64
|
1.53
|
1.43
|
1.33
|
7--
|
1.23
|
1.15
|
1.07
|
0.99
|
0.91
|
0.84
|
0.76
|
0.67
|
0.58
|
0.52
|
8--
|
0.45
|
0.41
|
0.36
|
0.31
|
0.26
|
0.23
|
0.19
|
0.15
|
0.13
|
0.11
|
9--
|
0.10
|
0.09
|
0.08
|
0.07
|
0.06
|
0.05
|
0.04
|
0.03
|
0.02
|
0.01
|
10--
|
0.01
|
|
|
|
|
|
|
|
|
|
On Doubles Becoming Homers
There is an old theory.. ..I think I heard it when I was a kid. ..that if a young player hits doubles, then, as he ages, those may become home runs or some of them may become home runs. I think I’ve probably studied this before, but our ability to study these kind of issues gets stronger all the time, with better computers and better data bases and better methods, so thought I’d take a look at it again.
I started with a matched-set study: Find two players who are identical in all of the other relevant respects, but one of whom hits more doubles than the other one, then look at the rest of their careers. We’re looking to see whether the one who hits more doubles as a young player will, later on, hit more home runs.
Here’s what I did. First of all, I took a spreadsheet which has batting records for every hitter in history (through 2011; haven’t updated this one, either.) From that file, I eliminated all players older than 27 years of age, because we are studying young players, which left me with 27,086 lines of data to work with. Then I eliminated all players with less than 500 career plate appearances, to help deal with small-sample distractions; this left me with 11,729 lines of data. I eliminated all players from before 1900 or since 2000—before 1900, because the game was so different and wasn’t actually major league baseball, and since 2000, since many of those players (post-2000) would still be in mid-career. That left me with 8,973 lines of data. Then I eliminated players who had less than 200 plate appearances in their most recent season. This left me with 7,840 lines of data.
For those 7,840 player/seasons, I "coded" each player with an 8-element code.
The first element was his age. . .23 for a 23-year-old player, etc.
The second element was his plate appearances in the most recent season, divided by 150, rounded down to the nearest integer, so that a player with 200-299 plate appearances would be "1", 300-449 would be "2", 450-599 would be "3", and 600 or more would be "4".
The third element was int(OPS*30) …that is, his OPS for the season, times 30, rounded down to the last integer. A player with a .700 OPS would be a 21, .800 would be 24, .900 would be 27, etc.
The fourth element was int(RC27*2). .. .that is, his runs created per 27 outs, times 2, rounded down to the last integer.
The fifth element was int(HR/3). . .that is, one-third of his home runs, rounded down, so that a player hitting 30 home runs would be at "10", a player hitting 40 home runs would be at "13", etc.
The sixth element was career games played divided by 100, rounded down.
The seventh element was the career percentage of his hits which were home runs, rounded down to the nearest whole integer. .. .7.29% would be "7", 7.94% would still be "7", etc.
The eighth element was int(OPS*30) again, but with the career OPS, whereas earlier it was with the season’s OPS.
I then sorted the data to identify players who had identical codes. There were 132 sets of identical codes in the data. ..for example, Todd Zeile after the 1992 season and Ray Lankford after the 1993 season were both coded 26-3-21-8-2-4-8-22—"26", because they were 26 years old, "3", because they had 450 to 599 plate appearances, "21" because both had OPS between .700 and .733, "8" because each player created between 4.00 and 4.50 runs per 27 outs, "2" because each player had 6 to 8 home runs, "4" because each player had played 400 to 499 games in his career, "8" because each player had hit home runs with 8 to 8.99% of his career home runs, and "22" because each player had a career OPS (at that time) between .734 and .766.
Unfortunately, while the two young Cardinals were a perfect match on these 8 elements, this fact was useless to us because they also had hit almost exactly the same number of doubles in their careers, as a percentage of their hits. What we are looking for is players who are alike in terms of playing time, home run rate, age and overall offensive ability, but different in the number of doubles that they hit.
I love doing matched-set studies because it is always fun to see who matches up against whom. Zeile and Lankford are kind of logical because they were teammates, so they’re linked in time and place. Most players who wind up paired in a matched-set study seem like reasonable matches when you look at it, but 90% of the time they are players you would never think to put together, like Sam Mele (1950) and Felix Jose (1992), or Zack Wheat (1910) with Harvey Kuenn (1953), or Dave Revering (1978) with Ron Cey (1973), or Fran Healy (1973) with Lee Tinsley (1995).
OK, I had 132 sets of players with identical codes, and I had to trim that down to pairs of one player who hit doubles and one player who didn’t. I decided on a minimum separation of .05 doubles as a percentage of hits. .. in other words, if one player had hit doubles with 17% of his career hits, the other player had to be under 12% or over 22% in order for the pair to be usable. Of the 132 sets of players, 40 were usable pairs.
To argue for the proposition that doubles become homers as the player ages. ..Eddie Joost and Freddie Patek. Both players were coded 27-3-16-5-0-5-2-18:
First
|
Last
|
YEAR
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
Avg
|
PA
|
RC
|
RC 27
|
AGE
|
OPS
|
Freddie
|
Patek
|
1972
|
136
|
518
|
59
|
110
|
25
|
4
|
0
|
32
|
47
|
64
|
.212
|
577
|
44
|
2.74
|
27
|
.556
|
Eddie
|
Joost
|
1943
|
124
|
421
|
34
|
78
|
16
|
3
|
2
|
20
|
68
|
80
|
.185
|
496
|
35
|
2.64
|
27
|
.550
|
Patek at this point had played 575 games in his career; Joost, 567. Patek had a career OPS of .629; Joost, of .612. Patek had hit 14 homers in his career; Joost had hit 13.
There was, however, this difference: that Patek in his career had hit only 69 doubles among 489 career hits, whereas Joost, with only 441 career hits, had hit 85 doubles. Joost had hit doubles on 19.3% of his career hits, Patek on 14.1%, so. . .it’s a qualifying match.
And, in fact, Joost did go on to develop much more power than Patek. Joost would hit 134 home runs in his career, hitting 13 to 23 homers every season from 1947 to 1952. Patek would never hit more than 6 homers in a season, 41 in his career, but would wind up his career with 8 million fly balls just short of the warning track. Joost’ doubles did, in fact, develop into home runs as he got older and stronger.
But it doesn’t happen 100% of the time. There are three players in the data who are coded 23-4-22-9-6-1-12-22. Two of them are Willie Jones, 1949, and Ernie Banks, 1953. Both players were 23-year-old rookies with a few prior at bats. Both players hit 19 homers, had similar enough stats that they qualify as an exact match. The one who hit 34 doubles was Willie Jones. The one who didn’t hit doubles was Ernie Banks.
Actually, there were three players in that dance; the third player who was coded a 23-4-22-9-6-1-12-22, another 23-year-old rookie with 19 homers, was Ron Gant in 1988. He hit more doubles (as a rookie) than Banks, but less than Jones. In his career, he hit more homers than Jones, but less than Banks. So the "doubles become homers" theory, in that case, is exactly backward; the fewer doubles you hit as a young player, the more home runs.
Let me jump to the results of the first matched-set study. The study suggests that it is more likely than not that there is some validity in the belief that "young players’ doubles" will become homers as the player ages. In this study the two groups of players were nearly identical in terms of career games played, at bats, triples, homers, runs created per 27 outs, OPS, career OPS, career home runs as a percentage of hits, etc. The only difference between them was that "Group A" had hit an average of 22 doubles in the season and 37 doubles in their careers, whereas Group B had had hit an average of 14 doubles in the season, 24 in their careers, and the hitters in Group B had offset this with slightly higher numbers of singles and walks, so that their overall offensive productivity was neither greater nor less.
In the rest of their careers, the hitters in Group A would hit an average of 31 more home runs, whereas the players in Group B would hit an average of only 27 more home runs. This, however, is despite the inclusion of Ernie Banks in the "Group B" portion of the study, balanced against Willie Jones in the "Group A" portion. All 80 players in the study hit a total of only 2,337 home runs in the rest of their careers. Ernie Banks himself hit 491 of those. Banks is an elephant in a roomful of kittens. The fact that Group A wins the home-run hitting contest despite the fact that Group B got the elephant suggests that there may be something there.
OK, that study has terrible flaws. The study suggests that that the theory is more likely to be true than false, but the study is so badly flawed that it isn’t really worth very much. What did I do wrong here?
I controlled too many "extraneous" factors. I set up too many controlled parameters, which limited the number of "exact matches" to 132, of which only 40 were usable matches. That gave us only 40 sets of players, which is too few. ..that was one thing that I did wrong. But I made a second and equally egregious mistake. I didn’t control for minimum hitting competence. I allowed players into the study if they "matched", even if neither one of them could hit worth a damn. Since zeroes are more likely to "match" than any other number, this created a set of 40 matches—MOST of which were between two terrible hitters. Sandy Alomar Sr., 1968, .578 OPS, is matched with Angel Salazar, 1986, .592 OPS. Gabby Street, 1908, .568 OPS, is matched up against Bill Shipke, 1908, .573 OPS. John Sullivan, 1943, .548 OPS, is matched up against Al Bridwell, 1906, .548 OPS. Fritz Mollwitz, 1915, .597 OPS, is matched up with Gus Getz, 1915, .587 OPS. Cesar Gutierrez, 1970, .573 OPS, is matched up with Carson Bigbee, 1970, .573 OPS. Most of the study is guys like this, who have very short and very unimpressive futures because they really can’t hit.
And I made a made a third mistake, less serious than the other two. In the first study I allowed Zack Wheat, 1910, to be matched up with Harvey Kuenn, 1953. 1953 is not like 1910. There is no power in the game in 1910, so nobody from 1910 is going to develop home-run power over the next five years.
The reason that you control a lot of different inputs, in a matched set study, is to prevent extraneous elements from leaking into the study. You’re kind of guessing what will work, guessing how many inputs you need to control and how many you can get by with controlling and still get good data. You very often guess wrong, and then you just have to back up and start over. So I backed up and started over.
In the second study, I made four changes:
1) I cut the number of controlled parameters from 8 to 3, controlling only for age, career OPS, and the percentage of career hits which were home runs. I did, however, change the formula for the OPS code from int(30*OPS) to int(50*OPS), which means that matched players have to fall into the same 20-point OPS range, rather than the same 33-point OPS range.
2) I eliminated all players from the study who created less than 4.00 runs per 27 outs, so that we would be less likely to be studying the futures of players who have no futures.
3) I put in a rule that a player from before 1920 (the Dead Ball era) could not be matched with a player from after 1920.
4) I cut the maximum age from 27 to 26 (although this was done, actually, after the fact, when it became apparent that it could be done without damaging the study.) I just got tired of marking the matches, and stopped before I got to the 27-year-olds.
In the second study I wound up with 550 matched sets of players. Despite cutting the number of controlled parameters from 8 to 3, I see no evidence of any extraneous factors leaking into the study. The players in the aggregate seem as well matched as before.
To cut to the chase. . .The theory that doubles by young players are predictive of future home runs appears to be clearly true, but of limited or no value in assessing the overall worth of the players. This chart compares the group averages of the two sets of players, 550 players in each group, in the base season of the study:
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg.
|
Group A
|
129
|
460
|
65
|
129
|
27
|
5
|
10
|
60
|
44
|
57
|
10
|
4
|
.279
|
Group B
|
132
|
474
|
69
|
136
|
18
|
6
|
10
|
58
|
47
|
53
|
15
|
5
|
.286
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The players in Group A had a .768 OPS (.279/.345/.423), whereas the players in Group B had a .769 OPS (.286/.353/.416). They were identical in age, the average player in each group being 24.658 years of age. This chart compares their career batting totals at that time.
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
Group A
|
357
|
1240
|
170
|
341
|
70
|
13
|
25
|
158
|
113
|
152
|
28
|
11
|
.273
|
Group B
|
368
|
1280
|
184
|
361
|
48
|
17
|
25
|
151
|
121
|
147
|
41
|
14
|
.280
|
The players in each group had a career OPS of .745 at that time--.273/.337/.408 for Group A, .280/.344/.401 for Group B.
In the rest of their careers, the players in Group A hit an average of 91 more home runs. The players in Group B hit an average of only 76. Since there are 550 players in each group, the difference between them is more 8,000 career home runs. This cannot reasonably be suspected of being a random outcome. The players who hit doubles as young players did, in fact, go on to hit more home runs later on.
But here’s the thing. It does not appear that the players who gain power derive any significant career advantage from the increase in power. This chart summarizes the "rest of career" performance for the two groups of players:
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
Group A
|
962
|
3334
|
477
|
929
|
175
|
27
|
91
|
466
|
362
|
424
|
58
|
25
|
.267
|
Group B
|
986
|
3359
|
479
|
935
|
153
|
33
|
76
|
419
|
368
|
388
|
85
|
32
|
.267
|
The players who hit more doubles did have an 11-point edge in OPS for the remainder of their careers, .739 to .728 (.267/.338/.401 for Group A, .267/.338/.390 for Group B). However, they played no more games in the rest of their careers, had no more plate appearances, scored no more runs.
The players who hit more doubles as young players were a little stronger; the other players were a little bit faster. It doesn’t appear that one is meaningfully better than the other, in terms of projecting a player’s future—different, but not better.