201744
Pitch to Contact, or Don’t, I Don’t Care
Recently I have been exposed to a broadcaster who talks a lot about Pitching to Contact, so. . ..OK, are pitchers more effective when they pitch to contact, or are they not? I don’t claim to have absolutely resolved that issue or to have any deep insight into it, but I have done a little study of the issue, which I will share with you.
I have game logs for tens of thousands of pitcher starts. Based on the game log, you can almost figure out the "contact percentage" against each pitcher in each start, almost. If you take the number of outs the pitcher has recorded, add his hits allowed and his walks allowed, you have the number of batters that he has faced. If you subtract from that the number of strikeouts and walks, then you have then number who have made contact. Divide one number by the other, and you have the percentage of batters who have made contact. There are little problems there like errors, double plays, runners caught stealing and hit batsmen, but generally. . . .If I let little problems with the data stop me, I would have stopped in the 1970s.
On August 4, 2006, with the Yankees and facing the Orioles, Randy Johnson pitched 6 innings, giving up 8 hits, so that makes 26 batters faced. He struck out no one and walked no one, so in that game the contact percentage against Randy Johnson was 100%26 out of 26. This was the only start of his career in which the Big Unit didn’t strike out or walk anyone. Actually it wasn’t 26 for 26; it was actually 25 for 25, as there was a double play in the game, but. . ..oh, well.
On April 30, 2008, back with the D’backs, the 44yearold giant pitched only four innings against the Astros, giving up 9 hits (that makes 21 batters) and striking out two, walking none, for a contact percentage of .905, the closest he ever came to .900. There are thousands of games in my data in which the contact percentage of the pitcher was exactly .900, but Randy never had one. On the other hand, he had nine games in his career in which the contact percentage was exactly .800. On May 19, 1988 against the Rangers, he faced 15 batters, walked one and struck out two, so that’s a contact percentage of .800. On August 9, 1991, he faced 20 batters, walked one and struck out four, so that also is a contact percentage of .800, 20 out of 25. On April 8, 2001 against the Cardinals he faced 30 batters, walked two and struck out four, so that again is a contact percentage of .800, 24 out of 30.
He had five games in his career in which his contact percentage was .700. Interestingly enough, in all of those it was 21 out of 30, and in four out of the five he struck out 7 batters and walked 2. In the other one he struck out 8 and walked 1. Perhaps I have done enough of this now, and we get the point, James; let’s move along.
So I figured the contact percentage for each pitcher in each start, in a total of 241,511 starts, which became 241,506 because there were a handful of starts in which a pitcher was injured and came out of the game without having recorded an out, a walk or a hit allowed. I just ignored those because you can’t figure a contact percentage in those. Then I sorted the data into ten groups of games because well, you know me; I like to sort data into ten groups of games.
The top group, Group 10, we will refer to as the 97% contact group. It is actually all pitchers who allowed contact to more than 95.0000% of batters that they faced in a game; we will refer to that as 97%. The second group was pitchers who allowed contact to more than 90% of the hitters they faced, but not more than 95.0000%. We will refer to that as 92%. The third group was 85.0001% to 90%, which we will refer to as 87%, etc. The bottom group of pitchers was all pitchers whose contact rate in a game was 55.0000% or less. These are the ten groups, and these are the numbers of games in each group:
Group

Contact %

Number of Games

10

97%

6157

9

92%

16063

8

87%

35183

7

82%

43417

6

77%

46539

5

72%

41075

4

67%

25836

3

62%

12999

2

57%

8042

1

52%

6195

So the most common contact percentages, as you can see, are between 70 and 85%; that’s where most of the games are. Of course, it is higher some years than others. There is a timeline bias, which messes with our ability to interpret the data, and there are all kinds of other biases in the data, of which perhaps the most important is the home plate umpire bias. If one pitcher walks several batters in a game there is a chance that the other pitcher will as well, because the umpire has a small strike zone. Lots of biases in the data.
The question I was trying to get to was, are pitchers more effective when their contact percentage is high, or when it is low? Here’s a first look at the data:
Group

Contact %

#

SP Wins

SP Losses

SP WPct

ERA

SO PG

W PG

10

97%

6157

1446

2695

.349

5.86

0.6

0.4

9

92%

16063

5543

6339

.467

4.40

1.8

1.2

8

87%

35183

12791

13339

.490

4.16

3.0

1.9

7

82%

43417

16062

16233

.497

4.02

4.3

2.5

6

77%

46539

17302

17092

.503

3.94

5.5

3.1

5

72%

41075

14700

15022

.495

3.98

6.9

3.7

4

67%

25836

9305

9083

.506

3.94

8.3

4.3

3

62%

12999

4803

4457

.519

3.88

9.7

4.8

2

57%

8042

2779

2738

.504

4.02

11.1

5.4

1

52%

6195

1909

2132

.472

4.41

13.4

6.7

Are the categories clear? SP Wins is Starting Pitcher Wins, SO PG and W PG are strikeouts per nine innings and walks per nine innings. The data suggests, generally, that pitchers are most effective when they AVOID pitching to contact, as long as they don’t get crazy about it. The most effective group of pitchers was those who had a 62% contact rate. They had the highest winning percentage of any group, .519, and the lowest ERA, 3.88.
The pitchers who had a 97% contact rate had a .349 winning percentage and a 5.86 ERA. Those are not good numbers, but the data for that group is skewed because of a tendency of short games to be extreme games. A pitcher—let’s say the same pitcher. Any pitcher is more likely to have a contact percentage of 100%, or of zero %, if he only faces five batters in a game than if he faces 40. He is more likely to have a very high or very low contact percentage if he faces 10 batters than if he faces 30. This tends to push the short starts into the marginal groups, and, of course, the short starts are also the BAD starts, and the long starts are also the GOOD starts.
It’s a bias in the data; it tends to show the center of the chart as more effective than the top and bottom of the chart. But ignoring that effect, ignore the 97% group and the 52% group, the data clearly shows that the most effective pitchers are those who don’t pitch to contact TOO MUCH.
Which is, in a sense, obvious; we have documented the obvious. When the pitching coach says "Pitch to contact", he doesn’t REALLY mean "just let the batters hit the ball; you’ll be OK." Obviously the pitcher doesn’t want to just let the batter make contact. What he really means is "don’t be SO afraid of letting the batter hit the first pitch that you wind up pitching to everybody 20 and 31." If you can strike the batter out, good, but if you can’t strike him out, don’t walk him. It’s really a matter of asserting the second principle—don’t work behind the hitters—so strongly that you give the impression that you think it is the FIRST principle. The first principle is "don’t just let them hit the ball, you jackass."
OK, that is the main data of the study; I am sort of done now. The winning percentage of the starting pitcher’s TEAM is a little different from the winning percentage of the starting pitcher:
Group

Contact %

Team Winning %

SP WPct

10

97%

.408

.349

9

92%

.482

.467

8

87%

.495

.490

7

82%

.502

.497

6

77%

.505

.503

5

72%

.503

.495

4

67%

.509

.506

3

62%

.516

.519

2

57%

.510

.504

1

52%

.494

.472

When starting pitchers have lower contact percentages (Group 3) their teams have higher winning percentages, but the starting pitcher sometimes doesn’t get credit for the win, because he tends to be out of the game a little bit earlier. Also, it would be nice if the offensive support was a constant up and down the chart, but because of the biases I was talking about earlier, that isn’t even close to being true. The pitchers in the 97% contact group were supported by 4.50 runs per game and gave up an average of 5.38 (for the entire game, counting relievers) while the pitchers in the 62% contact group were supported by only 4.35 and gave up only 4.26.
Group

Contact %

#

RPG

ORPG

10

97%

6157

4.50

5.38

9

92%

16063

4.45

4.62

8

87%

35183

4.48

4.47

7

82%

43417

4.42

4.36

6

77%

46539

4.38

4.32

5

72%

41075

4.37

4.35

4

67%

25836

4.37

4.31

3

62%

12999

4.35

4.26

2

57%

8042

4.41

4.38

1

52%

6195

4.44

4.63

This happens in part because, when you win the game at home, you usually don’t bat in the bottom of the 9^{th}, unless you are the 2017 Boston Red Sox and win most of your home games only in the bottom of the 9^{th} or extra innings. The runs per inning would be flatter than the runs per game. 53% of the starts in the 97% contact group are "road" games for the starting pitcher, and 51% are road games in the 92% and 87% groups. 51% of the games in the middle/lower part of the chart are home games. Here are the averages per 100 starts for each group:
Group


IP (thirds)

H

R

ER

BB

SO

CG

ShO

10

97%

1434

680

343

311

24

34

8.4

1.6

9

92%

1848

729

331

301

83

124

18.6

4.1

8

87%

1898

704

323

292

135

212

17.1

3.4

7

82%

1930

673

318

287

181

302

18.0

3.9

6

77%

1914

633

310

279

222

389

16.4

3.8

5

72%

1866

587

305

275

259

474

14.2

3.5

4

67%

1828

539

295

266

292

564

12.6

3.7

3

62%

1807

500

287

260

324

652

13.1

3.9

2

57%

1691

441

279

252

338

697

10.2

3.6

1

52%

1465

343

265

239

366

729

8.7

3.6

Pitchers in the 62% group recorded an average of 18.28 outs per start, or 6 and a third innings, striking out 5.64 batters and walking 2.92. You can see that as the contact percentage goes down, the innings pitched per start also goes down pretty sharply. In part, that’s a time line bias. Over time, longer starts have become more rare, and strikeouts more common, which biases the data in that direction, although this would probably be true anyway. The most complete games and shutouts are thrown in the 92% contact group, although the winning percentage of that group is low (.467) and the ERA is high (4.40).
Now, I know that some of you are wanting to ask this question, so I will go ahead and address it. Obviously Randy Johnson, Nolan Ryan and Sam McDowell and similar pitchers, Bobby Witt and J. R. Richard and Bob Veale, have lower contact percentages than Tommy John, Doyle Alexander, Claude Osteen and Tom Glavine. But do the same principles apply? Or might it not be that Nolan Ryantype pitchers are most effective when they AVOID contact, but Tom Glavinetype pitchers are most effective when the pitch to contact?
Tommy John made exactly 700 starts in his career. In the 350 starts in which he had the HIGHEST contact percentages, he was 150110 (.577), but had a 3.47 ERA. In those starts he struck out only 635 batters and walked only 423. When his contact percentage was lower, he had a BETTER ERA (3.21), but a lower winning percentage (134118, .531). In those starts he struck out 1,556 batters and walked 800. That data doesn’t seem to make any sense, so I don’t know what to do with that. I suppose that the anomaly results from the fact that more of his lowcontact starts were early in his career, in the 1960s, and in the 1960s the run scoring levels were much lower. If we assume that that is the true explanation, then we should treat his winning percentage as more relevant than his ERA, and conclude that he WAS, in fact, most effective when his contact percentage was high. 29 of his 46 career shutouts were in the lowcontact group.
Randy Johnson made 603 starts in his career, which we will count as 301 in each group and ignore the start right in the middle. Randy was clearly most effective when his contact percentage was low. When his contact percentage was high he was 14490 (.615) with a 3.67 ERA; when his contact percentage was low he was 15776 (.673) with a 2.94 ERA, and in those starts he struck out 3,063 batters in 2,083 innings.
Nolan Ryan had 773 starts in his career, which breaks down as 3861386. He also (like Randy) was clearly most effective when his contact percentage was low. When his contact percentage was high he was 146154 in his career (.487) with a 3.67 ERA. When his contact percentage was low he was 172136 (.558) with a 2.71 ERA, and he struck out 3,570 batters in 2,698 innings when his contact percentage was low.
Tom Glavine made 682 starts in his career. But Glavine was. . .well, maybe a LITTLE BIT more effective when his contact percentage was low, but about the same. When his contact percentage was high he was 152104, 3.56 ERA. When his contact percentage was high he was 15399, 3.50 ERA, really almost the same. .594 winning percentage when his contact percentage was high, .607 when it was low.
Greg Maddux is like Glavine, a little better with a low contact percentage. 740 career starts, 370 in each group. When his contact percentage was high, he was 174113, a .606 winning percentage and a 3.30 ERA. In those games he pitched 2,479 innings, striking out only 1,125 but walking only 340. When his contact percentage was low, he was a little better; he was 181113, a .616 percentage, pitching 2,522 innings, striking out twice as many hitters (2,241) and walking almost twice as many (655).
So. . .I don’t know. Part of the reason that is hard to check out is that there really aren’t any pitchers who have long careers with low strikeout rates, so you have to check out pitchers who have RELATIVELY low strikeout rates, but not REALLY low strikeout rates, or else you would have to do short careers.
OK, while I have you here I am also going to report some additional data from the previous study.
Tom Tango had asked what the winning percentage of the "hot" teams was during the previous ten games. Well, here’s the relevant data. . .more data than you had asked for. The teams in Group 1 had played, on average, 89.8 games at the moment they were measured as hot; that is, the identification on average was in the middle of the season, but a little past the midpoint because you usually have to play 15 or 20 games before you can be identified as hot or cold.
In their last ONE game, the teams in Group 1 had a winning percentage of .781 23 (16,262 wins, 4,554 losses.) In their last FIVE games, the teams in Group 1 had a winning percentage of .755 451, average 3.777 wins and 1.223 losses. In their last TEN games, the teams in Group 1 had a winning percentage of .730 680, averaging 7.287 wins and 2.686 losses; a few of the teams were not yet ten games into the season. In their last FIFTEEN games, the teams in Group 1 had a winning percentage of .709 027, averaging 10.543 wins and 4.327 losses. So their winning percentage had been .755 in the last five games, .706 in the five games before that, and .665 in the five games before that.
On average, the seasontodate winning percentage of the Hot teams was .590, averaging 52.93 wins and 36.81 losses to date. On average, the end of season winning percentage of the teams in Group 1 was .573 (91.68 wins, 68.25 losses), since good teams are hot more often than average teams, although all teams are hot and cold at some points of the season. (Not technically true; there were two teams in the study which never had a Heat Index lower than 50.000, and there was at least one team that never got over 50.000. I think the two teams which never had a cold snap in the season were the 1970 Baltimore Orioles and the 1976 Reds, but it could have been the 1969 Orioles and the 1975 Reds or something. I did that study but then I didn’t save the data, because I decided at the time not to publish it. I think the team that never got hot may have been the 2003 Tigers, but again, I’m not sure of that. I’m not sure the 1988 Orioles were ever hot.)
In the previous report, I didn’t EXACTLY explain how I reached the conclusion that the "recent performance effect" was about 1%. Here’s how I reached that conclusion.
When you divide the nextgame wins of the hottest teams by the EXPECTED nextgame wins of the hottest teams, you get a ratio of .964 590, a figure which was reported in the study. This number is less than 1.0000 because of the MASOG effect; when we decide which teams have been hottest, we are talking about teams which have more wins behind them relative to their final wonlost record, and thus fewer wins ahead of them.
But you remember that in the study I randomized the sequence of games four different times to create expectations for the data if there was no recentwins effect. When I randomized the sequence of games and repeated the study, the parallel number to this (.964 590) was, on average, .960 434—a figure NOT specifically reported in the original study. I concluded that there WAS a recentwins effect, in part, because the hottest teams performed relatively better in the actualsequence than in the randomsequence data.
But not, in this case, 1% better. Comparing .964 590 to .960 434, they were actually only half a percent better; actually, .004 327. Why, then, did I say 1%?
When you compare two derivative calculations, you get unstable data, because all of the instability in either of the original components will be included in the new calculation; I hope that the data analysts will understand what I mean by that, although I know the justbaseballfans won’t. While the number was only .0043 for Heat Group 1, it was one and a half percent for Heat Group 2, actually .015 480. Also, in Heat Group 10 (the COLDEST teams in the study), this number was negative .009 019—basically 1%. Then, in the center of the chart (the other seven heat groups) the number was smaller, although it was still mostly positive in the top half of the chart and mostly negative in the bottom half of the chart.
I decided—my interpretation of the data—that 1% was a better estimate of the effect than 4/10^{th} of one percent, so that was what I reported. But I didn’t get into the data related to this issue because it seemed to me that I was wandering into a speculative swamp. It just seemed to me that it was better to say "OK, it’s about 1%" and cut it off there, rather than putting forward calculations which are based on previous derivative calculations, which tend to be unstable.
I don’t know EXACTLY why Tom wanted this data or what he was going to do with it, but it sort of seemed to me that he was. . . . trying to move aggressively with this data. I’m not quite sure exactly what you’re trying to do, Tom, but it seems to me speculative. For one thing, the "Heat Index" is not EXACTLY derived from a team’s last ten games, or from their last five games, or from their last fifteen games; it is derived from some indefinite number of games which varies with HOW hot the team has been, but sometimes teams stay hot for 25 games. There is one team in the data but only one (the 1978 Brewers) which managed to make Heat Group 1 only 3 games into the season. They not only won those three games, they won them by a total of 29 runs, almost ten runs a game (113, 163, 135). I think that’s where the nickname "Bambi’s Bombers" came from, from that seasonopening series. That’s unusual, to win three straight games by 29 runs, and that marked them as a redhot team (top 10%) after only three games.
Anyway, my point is that if you assume that these numbers represent the last 10 games when in reality they might represent 3 exceptional games or the last 25 games, that that is another potential source of error in an area in which 1) we have limited understanding of the effect under scrutiny, and 2) there are already numerous possible sources of error. I wouldn’t be optimistic that your calculation would stand up to scrutiny by future researchers, I am saying. But I didn’t quite understand what you were trying to do, so I will leave that up to you.