To what extent does a team that controls the strike zone also control the game?
Our first task here is to state a team’s strikeouts and walks—offensive and defensive—as wins and losses. You may remember that three years ago I proposed a method to state a pitcher’s strikeouts and walks as wins and losses. That method was to figure the league average of strikeouts and walks per 18 innings, there being 18 innings in a game, one win and one loss. Each group of strikeouts equal to the league average per 18 innings was one "win" for a pitcher, and each group of walks was one loss. The article that discussed that method was posted here on March 30, 2009, and there was a follow-up article on the same method a few days later; you can still read those articles if you are interested.
To state a team’s strike zone control as a winning percentage, then, I started with something spun off of that method:
Each group of walks by batters equal to 36 innings worth of walks at the league rate was treated as one offensive win.
Each group of strikeouts equal to 36 innings of strikeouts was treated as one offensive loss.
Each group of strikeouts by pitchers equal to 36 innings worth was treated as one win for the pitching.
Each group of walks by pitchers was treated as one loss.
I’ll demonstrate that method with the 2011 Philadelphia Phillies, but I’ll warn you in advance that that didn’t quite work, so this isn’t the actual method I’m using here; I’m just walking you toward that. The team with the best strikeout/walk data in the majors last year was the Phillies. Philadelphia hitters drew 539 walks in 2011, and the league average per 36 innings was 12.5 walks. 539 divided by 12.5 is 43.1, so that’s 43 "wins" for the Philadelphia offense.
Philadelphia hitters struck out 1,024 times, and the league average per 36 innings was 29.2 strikeouts. 1024 divided by 29.2 is 35.1, so that’s 35 "strike zone losses" for the Philadelphia offense.
Philadelphia pitchers struck out 1,299 batters. That figures out to 44.53 "strike zone wins" for the Philadelphia pitchers.
Philly pitchers walked 404 batters. That figures out to 32.3 "strike zone losses" for the Philadelphia pitchers.
Adding together the batters (43-35) and the pitchers (45-32), the Phillies had a "strike zone won-lost record" of 88-67, a .565 winning percentage, which was the best in baseball.
The problem with that method was that the standard deviation of wins and losses by this method was unrealistically small (meaning, for those of you who don’t speak math, that there was a severe shortage of teams with very good records or very poor records—as is suggested by the fact that the best record in baseball in 2011 was 88-67.) Taking all major league teams since 1980, the standard deviation of winning percentage is 68 points. Using this method, the standard deviation of winning percentage was 28 points—not even in the right ballpark.
I knew immediately why that was true, of course. There isn’t a straight-line relationship between wins and elements of wins in baseball; it’s always a relationship of squares, or something more like that. There are a million ways to decentralize the strike zone winning percentages, imitating the relationship of squares, but what I decided to do was this. First, I divided each team’s strikeouts and walks by the league norms per 18 innings, rather than per 36 innings as I was doing before. This makes the Philadelphia won-lost record 175-135, rather than 88-67. Then I subtracted one-fourth of the "K Zone decisions" from both the wins and the losses. 175-135 is 310 K Zone decisions. One-fourth of that is 77 or 78.. ..something in there. 77.55. Subtract that from the wins and from the losses, and Philadelphia’s Strike Zone Team Winning Percentage is .630. 98-57. Coincidentally, this is the same as their actual winning percentage, .630 (102-60); that doesn’t happen very often.
This "revised" method is actually more consistent with the original method, proposed in the article in March, 2009, than was the first attempt. That method based the pitcher’s won-lost record on his strikeouts and walks per 18 innings. This method is the same. If a pitcher was 16-10 in that method, his individual won-lost contribution would be 16-10 here. We’ve just "adjusted" that by overlaying the success of the hitters on top of the success of the pitchers—in essence saying that a pitcher who deserved to be 16-10 might be 18-8 if the hitters on his team were also good.
This leads to a series of questions:
1) What are the best and worst strike zone won-lost records of recent years?
2) How well does the strike zone won-lost record correlate with the team’s "overall" or "actual" won-lost record?, and
3) Are the anomalies at all predictive?
1) What are the best and worst strike zone won-lost records of recent years?
These are the 25 teams with the best control of the strike zone, offensive and defensive, since 1980:
Rank
|
Year
|
Tm
|
Lg
|
Wins
|
Losses
|
KZ W Pct
|
1
|
2002
|
ARI
|
NL
|
107
|
54
|
.664
|
2
|
2003
|
NYY
|
AL
|
114
|
58
|
.660
|
3
|
1981
|
NYY
|
AL
|
72
|
37
|
.660
|
4
|
1988
|
BOS
|
AL
|
111
|
58
|
.656
|
5
|
1991
|
NYM
|
NL
|
100
|
53
|
.651
|
6
|
2006
|
MIN
|
AL
|
95
|
51
|
.650
|
7
|
1986
|
BOS
|
AL
|
99
|
54
|
.649
|
8
|
1988
|
NYM
|
NL
|
106
|
59
|
.640
|
9
|
1994
|
CHW
|
AL
|
72
|
41
|
.639
|
10
|
1990
|
NYM
|
NL
|
106
|
60
|
.639
|
11
|
1996
|
CLE
|
AL
|
95
|
56
|
.631
|
12
|
2011
|
PHI
|
NL
|
98
|
57
|
.630
|
13
|
1983
|
PHI
|
NL
|
108
|
65
|
.625
|
14
|
1981
|
CLE
|
AL
|
64
|
39
|
.625
|
15
|
2004
|
NYY
|
AL
|
101
|
61
|
.622
|
16
|
1989
|
BOS
|
AL
|
108
|
66
|
.620
|
17
|
2007
|
MIN
|
AL
|
88
|
55
|
.616
|
18
|
2001
|
OAK
|
AL
|
102
|
64
|
.615
|
19
|
2004
|
SDP
|
NL
|
90
|
56
|
.615
|
20
|
2007
|
BOS
|
AL
|
105
|
66
|
.615
|
21
|
1980
|
NYY
|
AL
|
106
|
67
|
.615
|
22
|
1999
|
HOU
|
NL
|
103
|
65
|
.614
|
23
|
1991
|
PIT
|
NL
|
95
|
60
|
.614
|
24
|
2010
|
MIN
|
AL
|
90
|
57
|
.613
|
25
|
1994
|
MON
|
NL
|
68
|
43
|
.613
|
And these are the actual won-lost records of those teams:
|
|
|
|
|
|
|
|
Actual
|
Rank
|
Year
|
Tm
|
Lg
|
Wins
|
Losses
|
KZ W Pct
|
|
W
|
L
|
W-L%
|
1
|
2002
|
ARI
|
NL
|
107
|
54
|
.664
|
|
98
|
64
|
.605
|
2
|
2003
|
NYY
|
AL
|
114
|
58
|
.660
|
|
101
|
61
|
.623
|
3
|
1981
|
NYY
|
AL
|
72
|
37
|
.660
|
|
59
|
48
|
.551
|
4
|
1988
|
BOS
|
AL
|
111
|
58
|
.656
|
|
89
|
73
|
.549
|
5
|
1991
|
NYM
|
NL
|
100
|
53
|
.651
|
|
77
|
84
|
.478
|
6
|
2006
|
MIN
|
AL
|
95
|
51
|
.650
|
|
96
|
66
|
.593
|
7
|
1986
|
BOS
|
AL
|
99
|
54
|
.649
|
|
95
|
66
|
.590
|
8
|
1988
|
NYM
|
NL
|
106
|
59
|
.640
|
|
100
|
60
|
.625
|
9
|
1994
|
CHW
|
AL
|
72
|
41
|
.639
|
|
67
|
46
|
.593
|
10
|
1990
|
NYM
|
NL
|
106
|
60
|
.639
|
|
91
|
71
|
.562
|
11
|
1996
|
CLE
|
AL
|
95
|
56
|
.631
|
|
99
|
62
|
.615
|
12
|
2011
|
PHI
|
NL
|
98
|
57
|
.630
|
|
102
|
60
|
.630
|
13
|
1983
|
PHI
|
NL
|
108
|
65
|
.625
|
|
90
|
72
|
.556
|
14
|
1981
|
CLE
|
AL
|
64
|
39
|
.625
|
|
52
|
51
|
.505
|
15
|
2004
|
NYY
|
AL
|
101
|
61
|
.622
|
|
101
|
61
|
.623
|
16
|
1989
|
BOS
|
AL
|
108
|
66
|
.620
|
|
83
|
79
|
.512
|
17
|
2007
|
MIN
|
AL
|
88
|
55
|
.616
|
|
79
|
83
|
.488
|
18
|
2001
|
OAK
|
AL
|
102
|
64
|
.615
|
|
102
|
60
|
.630
|
19
|
2004
|
SD
|
NL
|
90
|
56
|
.615
|
|
87
|
75
|
.537
|
20
|
2007
|
BOS
|
AL
|
105
|
66
|
.615
|
|
96
|
66
|
.593
|
21
|
1980
|
NYY
|
AL
|
106
|
67
|
.615
|
|
103
|
59
|
.636
|
22
|
1999
|
HOU
|
NL
|
103
|
65
|
.614
|
|
97
|
65
|
.599
|
23
|
1991
|
PIT
|
NL
|
95
|
60
|
.614
|
|
98
|
64
|
.605
|
24
|
2010
|
MIN
|
AL
|
90
|
57
|
.613
|
|
94
|
68
|
.580
|
25
|
1994
|
MON
|
NL
|
68
|
43
|
.613
|
|
74
|
40
|
.649
|
The Arizona team with the complete dominance of the strike zone was the Schilling/Big Unit team, led by two pitchers with fantastic strikeout/walk ratios. It’s not just Johnson and Schilling, though; that team also led the league in walks drawn by hitters, and was fourth in the league in fewest strikeouts by hitters.
The teams with the least control of the strike zone, over the same years, are these, with their actual won-lost records:
|
|
|
|
|
|
|
|
Actual
|
Rank
|
Year
|
Tm
|
Lg
|
Wins
|
Losses
|
KZ W Pct
|
|
W
|
L
|
W-L%
|
1
|
1983
|
NYM
|
NL
|
51
|
107
|
.325
|
|
68
|
94
|
.420
|
2
|
1996
|
DET
|
AL
|
60
|
116
|
.341
|
|
53
|
109
|
.327
|
3
|
2001
|
MIL
|
NL
|
62
|
113
|
.353
|
|
68
|
94
|
.420
|
4
|
2003
|
DET
|
AL
|
55
|
100
|
.354
|
|
43
|
119
|
.265
|
5
|
2003
|
TB
|
AL
|
57
|
104
|
.354
|
|
63
|
99
|
.389
|
6
|
1982
|
NYM
|
NL
|
62
|
106
|
.370
|
|
65
|
97
|
.401
|
7
|
1986
|
CLE
|
AL
|
56
|
95
|
.370
|
|
84
|
78
|
.519
|
8
|
2007
|
TEX
|
AL
|
64
|
107
|
.373
|
|
75
|
87
|
.463
|
9
|
2002
|
DET
|
AL
|
51
|
86
|
.373
|
|
55
|
106
|
.342
|
10
|
1999
|
FLA
|
NL
|
58
|
96
|
.374
|
|
64
|
98
|
.395
|
11
|
2002
|
TB
|
AL
|
62
|
102
|
.377
|
|
55
|
106
|
.342
|
12
|
1980
|
TOR
|
AL
|
63
|
104
|
.377
|
|
67
|
95
|
.414
|
13
|
1981
|
TOR
|
AL
|
40
|
66
|
.378
|
|
37
|
69
|
.349
|
14
|
1990
|
NYY
|
AL
|
62
|
102
|
.379
|
|
67
|
95
|
.414
|
15
|
1993
|
COL
|
NL
|
60
|
98
|
.380
|
|
67
|
95
|
.414
|
16
|
2003
|
CIN
|
NL
|
64
|
103
|
.382
|
|
69
|
93
|
.426
|
17
|
2005
|
TB
|
AL
|
63
|
100
|
.387
|
|
67
|
95
|
.414
|
18
|
2006
|
KC
|
AL
|
63
|
99
|
.389
|
|
62
|
100
|
.383
|
19
|
2006
|
TB
|
AL
|
64
|
99
|
.390
|
|
61
|
101
|
.377
|
20
|
2004
|
COL
|
NL
|
68
|
104
|
.394
|
|
68
|
94
|
.420
|
21
|
1998
|
FLA
|
NL
|
67
|
103
|
.395
|
|
54
|
108
|
.333
|
22
|
2006
|
PIT
|
NL
|
65
|
99
|
.396
|
|
67
|
95
|
.414
|
23
|
2008
|
PIT
|
NL
|
61
|
93
|
.396
|
|
67
|
95
|
.414
|
24
|
2000
|
MIL
|
NL
|
68
|
102
|
.398
|
|
73
|
89
|
.451
|
25
|
2005
|
KC
|
AL
|
64
|
97
|
.399
|
|
56
|
106
|
.346
|
This chart summarizes the top and bottom teams from the last ten years:
|
|
|
|
|
|
|
|
Actual
|
Rank
|
Year
|
Tm
|
Lg
|
Wins
|
Losses
|
KZ W Pct
|
|
W
|
L
|
W-L%
|
1
|
2002
|
BOS
|
AL
|
97
|
62
|
.609
|
|
93
|
69
|
.574
|
2
|
2002
|
SEA
|
AL
|
99
|
66
|
.602
|
|
93
|
69
|
.574
|
13
|
2002
|
TB
|
AL
|
62
|
102
|
.377
|
|
55
|
106
|
.342
|
14
|
2002
|
DET
|
AL
|
51
|
86
|
.373
|
|
55
|
106
|
.342
|
1
|
2002
|
ARI
|
NL
|
107
|
54
|
.664
|
|
98
|
64
|
.605
|
2
|
2002
|
SF
|
NL
|
85
|
69
|
.551
|
|
95
|
66
|
.590
|
15
|
2002
|
COL
|
NL
|
64
|
86
|
.429
|
|
73
|
89
|
.451
|
16
|
2002
|
MIL
|
NL
|
66
|
97
|
.405
|
|
56
|
106
|
.346
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2003
|
NYY
|
AL
|
114
|
58
|
.660
|
|
101
|
61
|
.623
|
2
|
2003
|
BOS
|
AL
|
105
|
68
|
.607
|
|
95
|
67
|
.586
|
13
|
2003
|
TB
|
AL
|
57
|
104
|
.354
|
|
63
|
99
|
.389
|
14
|
2003
|
DET
|
AL
|
55
|
100
|
.354
|
|
43
|
119
|
.265
|
1
|
2003
|
ARI
|
NL
|
93
|
71
|
.568
|
|
84
|
78
|
.519
|
2
|
2003
|
STL
|
NL
|
82
|
70
|
.539
|
|
85
|
77
|
.525
|
15
|
2003
|
NYM
|
NL
|
64
|
87
|
.426
|
|
66
|
95
|
.410
|
16
|
2003
|
CIN
|
NL
|
64
|
103
|
.382
|
|
69
|
93
|
.426
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2004
|
NYY
|
AL
|
101
|
61
|
.622
|
|
101
|
61
|
.623
|
2
|
2004
|
BOS
|
AL
|
100
|
73
|
.579
|
|
98
|
64
|
.605
|
13
|
2004
|
KC
|
AL
|
63
|
85
|
.427
|
|
58
|
104
|
.358
|
14
|
2004
|
TOR
|
AL
|
69
|
93
|
.426
|
|
67
|
94
|
.416
|
1
|
2004
|
SD
|
NL
|
90
|
56
|
.615
|
|
87
|
75
|
.537
|
2
|
2004
|
SF
|
NL
|
98
|
64
|
.604
|
|
91
|
71
|
.562
|
15
|
2004
|
NYM
|
NL
|
68
|
93
|
.421
|
|
71
|
91
|
.438
|
16
|
2004
|
COL
|
NL
|
68
|
104
|
.394
|
|
68
|
94
|
.420
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2005
|
NYY
|
AL
|
100
|
71
|
.583
|
|
95
|
67
|
.586
|
2
|
2005
|
BOS
|
AL
|
100
|
72
|
.583
|
|
95
|
67
|
.586
|
13
|
2005
|
KC
|
AL
|
64
|
97
|
.399
|
|
56
|
106
|
.346
|
14
|
2005
|
TB
|
AL
|
63
|
100
|
.387
|
|
67
|
95
|
.414
|
1
|
2005
|
PHI
|
NL
|
100
|
71
|
.584
|
|
88
|
74
|
.543
|
2
|
2005
|
SD
|
NL
|
95
|
69
|
.581
|
|
82
|
80
|
.506
|
15
|
2005
|
COL
|
NL
|
70
|
94
|
.428
|
|
67
|
95
|
.414
|
16
|
2005
|
PIT
|
NL
|
64
|
96
|
.401
|
|
67
|
95
|
.414
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2006
|
MIN
|
AL
|
95
|
51
|
.650
|
|
96
|
66
|
.593
|
2
|
2006
|
BOS
|
AL
|
101
|
74
|
.576
|
|
86
|
76
|
.531
|
13
|
2006
|
TB
|
AL
|
64
|
99
|
.390
|
|
61
|
101
|
.377
|
14
|
2006
|
KC
|
AL
|
63
|
99
|
.389
|
|
62
|
100
|
.383
|
1
|
2006
|
LA
|
NL
|
90
|
66
|
.577
|
|
88
|
74
|
.543
|
2
|
2006
|
HOU
|
NL
|
92
|
70
|
.567
|
|
82
|
80
|
.506
|
15
|
2006
|
FLA
|
NL
|
70
|
100
|
.410
|
|
78
|
84
|
.481
|
16
|
2006
|
PIT
|
NL
|
65
|
99
|
.396
|
|
67
|
95
|
.414
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2007
|
MIN
|
AL
|
88
|
55
|
.616
|
|
79
|
83
|
.488
|
2
|
2007
|
BOS
|
AL
|
105
|
66
|
.615
|
|
96
|
66
|
.593
|
13
|
2007
|
KC
|
AL
|
65
|
84
|
.434
|
|
69
|
93
|
.426
|
14
|
2007
|
TEX
|
AL
|
64
|
107
|
.373
|
|
75
|
87
|
.463
|
1
|
2007
|
LA
|
NL
|
88
|
65
|
.574
|
|
82
|
80
|
.506
|
2
|
2007
|
HOU
|
NL
|
85
|
74
|
.533
|
|
73
|
89
|
.451
|
15
|
2007
|
WSH
|
NL
|
68
|
91
|
.428
|
|
73
|
89
|
.451
|
16
|
2007
|
FLA
|
NL
|
73
|
108
|
.403
|
|
71
|
91
|
.438
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2008
|
TOR
|
AL
|
90
|
64
|
.586
|
|
86
|
76
|
.531
|
2
|
2008
|
CHW
|
AL
|
89
|
67
|
.570
|
|
89
|
74
|
.546
|
13
|
2008
|
SEA
|
AL
|
64
|
86
|
.427
|
|
61
|
101
|
.377
|
14
|
2008
|
BAL
|
AL
|
68
|
96
|
.414
|
|
68
|
93
|
.422
|
1
|
2008
|
LA
|
NL
|
88
|
66
|
.570
|
|
84
|
78
|
.519
|
2
|
2008
|
CHC
|
NL
|
96
|
78
|
.553
|
|
97
|
64
|
.602
|
15
|
2008
|
FLA
|
NL
|
74
|
97
|
.431
|
|
84
|
77
|
.522
|
16
|
2008
|
PIT
|
NL
|
61
|
93
|
.396
|
|
67
|
95
|
.414
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2009
|
NYY
|
AL
|
109
|
76
|
.590
|
|
103
|
59
|
.636
|
2
|
2009
|
BOS
|
AL
|
106
|
78
|
.577
|
|
95
|
67
|
.586
|
13
|
2009
|
SEA
|
AL
|
68
|
89
|
.432
|
|
85
|
77
|
.525
|
14
|
2009
|
TEX
|
AL
|
69
|
97
|
.417
|
|
87
|
75
|
.537
|
1
|
2009
|
ATL
|
NL
|
92
|
70
|
.568
|
|
86
|
76
|
.531
|
2
|
2009
|
LA
|
NL
|
92
|
75
|
.553
|
|
95
|
67
|
.586
|
15
|
2009
|
WSH
|
NL
|
70
|
93
|
.431
|
|
59
|
103
|
.364
|
16
|
2009
|
PIT
|
NL
|
62
|
86
|
.416
|
|
62
|
99
|
.385
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2010
|
MIN
|
AL
|
90
|
57
|
.613
|
|
94
|
68
|
.580
|
2
|
2010
|
TB
|
AL
|
101
|
79
|
.562
|
|
96
|
66
|
.593
|
13
|
2010
|
CLE
|
AL
|
72
|
93
|
.439
|
|
69
|
93
|
.426
|
14
|
2010
|
BAL
|
AL
|
65
|
83
|
.438
|
|
66
|
96
|
.407
|
1
|
2010
|
PHI
|
NL
|
90
|
60
|
.599
|
|
97
|
65
|
.599
|
2
|
2010
|
ATL
|
NL
|
96
|
70
|
.579
|
|
91
|
71
|
.562
|
15
|
2010
|
ARI
|
NL
|
74
|
99
|
.428
|
|
65
|
97
|
.401
|
16
|
2010
|
PIT
|
NL
|
64
|
87
|
.422
|
|
57
|
105
|
.352
|
|
|
|
|
|
|
|
|
|
|
|
1
|
2011
|
CHW
|
AL
|
88
|
66
|
.573
|
|
79
|
83
|
.488
|
2
|
2011
|
NYY
|
AL
|
101
|
76
|
.572
|
|
97
|
65
|
.599
|
13
|
2011
|
MIN
|
AL
|
66
|
80
|
.451
|
|
63
|
99
|
.389
|
14
|
2011
|
BAL
|
AL
|
69
|
88
|
.440
|
|
69
|
93
|
.426
|
1
|
2011
|
PHI
|
NL
|
98
|
57
|
.630
|
|
102
|
60
|
.630
|
2
|
2011
|
STL
|
NL
|
87
|
64
|
.577
|
|
90
|
72
|
.556
|
15
|
2011
|
HOU
|
NL
|
67
|
91
|
.425
|
|
56
|
106
|
.346
|
16
|
2011
|
PIT
|
NL
|
68
|
94
|
.419
|
|
72
|
90
|
.444
|
2) How well does the strike zone won-lost record correlate with the team’s "overall" or "actual" won-lost record?
Of course, you can see the answer to that question in the charts above. Teams with poor control of the strike zone rarely win, and never win big; teams with good control of the strike zone almost always win. I sorted teams by their strike zone winning percentage, and then figured the actual winning percentage of the teams in each group:
|
|
Strike Zone
|
Overall
|
Range
|
# of Teams
|
W
|
L
|
Pct
|
W
|
L
|
Pct
|
.600 and Up
|
38
|
3696
|
2239
|
.623
|
3414
|
2526
|
.575
|
.575 to .599
|
48
|
4460
|
3178
|
.584
|
4360
|
3214
|
.576
|
.550 to .574
|
83
|
7384
|
5768
|
.561
|
7342
|
5800
|
.559
|
.525 to .549
|
128
|
10951
|
9492
|
.536
|
10997
|
9449
|
.538
|
.500 to .524
|
159
|
12776
|
12211
|
.511
|
12829
|
12297
|
.511
|
.475 to .499
|
136
|
10408
|
10927
|
.488
|
10398
|
11240
|
.481
|
.450 to .474
|
120
|
8636
|
10035
|
.463
|
8842
|
10005
|
.469
|
.425 to .449
|
120
|
8247
|
10600
|
.438
|
8499
|
10502
|
.447
|
.400 to .424
|
41
|
2634
|
3747
|
.413
|
2791
|
3597
|
.437
|
Under .400
|
25
|
1510
|
2503
|
.376
|
1575
|
2417
|
.395
|
At the margins, of course, the charts turn in; teams with .600+ Strike Zone winning percentages do not have actual .600+ winning percentages. This normally happens with charts of this nature. With that exception, however, the strike zone winning percentage is a very good predictor of a team’s actual winning percentage.
77% of teams with strike zone winning percentages of .500 and above have actual winning percentages of .500 and above, and 77% of teams with strike zone winning percentages under .500 have actual winning percentages under .500.
If you took all the team winning percentages in the study and all of the strike zone winning percentages in the study and randomly re-aligned them, you would get an average discrepancy between the two of .071, or 71 points. The actual average discrepancy between the two is .0416, or 42 points.
(The edges curl under, in a chart like this, for this reason. If you look at teams with "expected" winning percentages of .525 to .549, most will have actual winning percentages in the same range, but some will do better and some will do worse due to what we could characterize as random divergence, the term "random" in this context not implying that there is not an underlying cause for the divergence, but merely that there is no predictable relationship between that underlying cause and the strike zone winning percentage. But if you look at .600 teams, because there are many, many more teams with winning percentages under .600 than over .650, there is thus much more random divergence in a downward direction than in an upward direction—thus, it is generally impossible for groups of teams near the margins of the chart to have a "faithful" winning percentage. This happens with almost all charts of this nature.)
3) Are the anomalies at all predictive?
The largest anomalies in our data (1980 to the present) are for the 1981 Oakland A’s and the 1991 New York Mets. The 1981 A’s, a Billy Martin team, had a strike zone winning percentage of .416 (49-69), but an actual winning percentage of .587 (64-45). The 1991 Mets (The Worst Team Money Could Buy) had a strike zone winning percentage of .651 (100-53), but an actual won-lost percentage of .478 (77-84. The book, The Worst Team Money Could Buy, is actually about the ’92 Mets, not the ’91 Mets.)
Those are the only two teams in our data which had winning percentages 150 points better or worse than their strike zone winning percentage. 6% of teams had strike zone winning percentages 100 points better or worse than their actual winning percentage, and 31% had discrepancies of 50 points or more.
The question of whether these anomalies are predictive is really two questions:
1) Are discrepancies in the strike zone winning percentage consistent from year to year?, and
2) Are these discrepancies predictive of actual changes in the performance of the teams in subsequent seasons?
The answer to both questions is "Yes"—clearly, obviously, and to a surprising extent. Teams that underperform or overperform their strike zone winning percentage in one season have a clear tendency to do so again the next season. About 59% of teams which overperform their strike zone winning percentage in one season will do so again the next season, and vice versa. The reason for this is obvious. If you outperform your strike zone winning percentage, your team is probably strong in those areas of performance unmeasured by the strike zone, like speed and defense. If your team is strong in those areas in one season, they have a good chance of being strong in those areas in the next season.
But the more meaningful issue is the second one: Discrepancies between strike zone and actual winning percentages are clearly and obviously predictive of the future performance of the teams. 60% of teams which have better won-lost records than strike-zone won-lost records will perform worse in the following season—and vice versa—and the movements here are much larger than those measured in the previous query.
You remember the 1981 A’s, who went 64-45 despite a .416 strike zone winning percentage? The next season their actual winning percentage was .420.
The second-highest "positive discrepancy" was for the 1986 Cleveland Indians, who went 84-78 despite a strike zone winning percentage of .370, making them +.142. The next season Sports Illustrated predicted that the Indians would be the best team in the American League. Their actual winning percentage was .377 (61-101).
I said that the predictive power of this discrepancy was "obvious". What I meant is that if you line up the teams which most exceeded their strike zone winning percentage, it is obvious that they all had poor seasons the next year. Of the 30 teams in our study which exceeded their strike zone winning percentage by the widest margins, 28 suffered declines in winning percentage in the following season. Of the 30 teams which fell short of their strike zone winning percentage by the widest margins, 22 did better in the following season.
A couple of cautionary points. Some of this "predictive" power would be explained simply by the tendency of all teams to return toward .500; i.e., good teams getting worse, bad teams getting better. Second, some of this predictive power is probably a redundant statement of something we already knew, which is that teams which exceed their Pythagorean expectation tend to be unable to sustain that from year to year (that is, the teams which had better won-lost records than strike-zone won-lost records probably also had better won-lost records than would have been expected based on their runs scored and runs allowed—thus, could have been expected to decline without any focus on the strike zone won-lost record.)
I have not yet had the time to tease out these issues, figure out to what extent the predictive power of this discrepancy is redundant of things that we already knew. My impression is that there is something new and worthwhile here, but. … .haven’t yet had time to study it.
Thanks for reading,
Bill James
Addendum
OK, after I wrote this I was stuck in an airport for several hours with nothing to do except work, so I worked on the problem of distinguishing the predictive tendency of the strike zone winning percentage from the previously known factors of the Pythagorean discrepancy and the tendency of all teams to drift toward .500.
To eliminate those problems, I established an expectation for each team based on their Pythagorean winning percentage in the previous season. If their Pythagorean winning percentage in the previous season was .600, their expected winning percentage was .550; if the Pythagorean percentage in the previous season was .550, their expected winning percentage was .525, etc. .400 becomes .450. In this way, we had expectations for each team which removed the previously known predictive biases. . .the two which are relevant to this study.
This adjustment removed most, but not all, of the predictive power of the Strike Zone Winning Percentage. Even adjusting for those factors, there was some tendency of teams to move, in the following season, in the direction of their strike zone winning percentage.
I mentioned above that of the 30 teams in our study which exceeded their strike zone winning percentage by the widest margins, 28 suffered declines in winning percentage in the following season. Adjusting for these other biases, that becomes 19 out of 30.
I sorted all teams in the study (all teams 1980-2010, excluding 2011 because we don’t yet know the next-season performance for the 2011 teams.) I sorted all teams in the study into 7 groups, leaving 124 teams in each group. Group 1 was the teams that over-achieved relative to their strike zone winning percentages by the widest margins; Group 7 was the teams that under-achieved by the widest margins.
When I did this, the Group 1 teams under-achieved relative to expectations in the following season by 1.5%, or 1.2 games per team—far larger than the under-achievement rate of any of the other six groups. In other words, the teams that over-achieved in Season One, relative to the strike zone winning percentage, under-achieved in Season Two, relative to their baseline expectation.
The Group 7 teams also over-achieved in the following season, by 0.7%. Essentially, it appears to me—pending further study—that there is some predictive value in the strike zone winning percentage.
I would put it this way: that we are engaged in an endless struggle to distinguish in the statistics between what is real and what is an illusion. We long ago learned to focus on runs scored and runs allowed, as predictors, because these were more reliably predictable than wins. Voros showed us that we should focus on strikeouts and walks for pitchers, rather than hits allowed, because hits allowed totals by pitchers have too much illusion.
This, I think, is just another little tool we can use—like Abe Lincoln Scores—to help us to focus on what is most "real" in the team’s record. If a team scores runs or prevents runs without controlling the strike zone, we should be a little bit suspicious of that team. That, I think, is the real message here.
One other observation before I go. . ..just a kind of an obvious thing that I’d never noticed before, although it is obvious once you’ve noticed it. As well as "strike zone success" here we could also measure "strike zone weight". The 1993 Phillies, for example, had a strike zone won-lost record of 105-85, which is 190 decisions. They had lots of strikeouts and lots of walks, both ways. They led the league in walks drawn (665). They were second in strikeouts by hitters (1049). Their pitchers were also third in the league in walks issued (573), and first in strikeouts (1117).
On the other end of that you have the 1983 Kansas City Royals. Playing the same number of games, they had only 397 walks, 722 strikeouts by hitters, 471 walks, 593 strikeouts by pitchers. Their strike zone won-lost record was 56-81, which is 137 decisions. All together, the 1993 Phillies had 3,404 strikeouts and walks, and the 1983 Royals had 2,183.
We could call teams like the 1993 Phillies "strike zone heavy", and teams like the 1983 Royals "strike zone light". In 2011 the lightest team was the Twins—who have been "light" for years—and the heaviest was the Yankees. The Yankees and Red Sox play four-hour games, traditionally, because both teams are strike zone heavy. Not this year; the Red Sox this year have lost control of the strike zone.
Anyway, my new observation here was this. Strike zone weight is characteristic of an old hitter, but of a young pitcher. As a batter ages he walks more. He strikes out less for a few years, but then he also begins to strike out more. His "heaviest" years—most strikeouts and walks—will tend to be when he is an older player.
But for a pitcher, the pattern is exactly the opposite: as he ages he tends to have fewer strikeouts and fewer walks, I would assume. I would assume that if you identified the "strike zone weight" for each pitcher with a long career, most would tend to have their heaviest seasons early, and their lightest seasons late. I just thought that was kind of interesting.
Thanks,
Bill James