Runs Created by an Individual Hitter In a Game
Perhaps I can explain the purpose of this research by explaining how I came to it. I am working on a system to reevaluate players day to day, so I needed a "Game Score" for a hitter. I worked with several different formulas, but none of them were quite right. I had a Game Score for hitters that I used on background research for a year or more. I liked it, obviously, or I wouldn’t have used it for a year, but it had its problems. Almost everybody would have one to ten games in their careers which scored in the range of 25 to 30; an occasional player would have a game that would score over 30, and truly historic games, like 4homer games, would score around 50. Many, many games, however, would get negative numbers. . . any ohfer was a negative number, so everybody had negative numbers 30, 40% of the time. I didn’t like that.
I tried to create a zerotoahundred type scale, like the Game Scores for Pitchers, but I tried to create a zerotoahundred scale with proportional representation of events. That didn’t really work; I got everybody in the range between zero and a hundred, but I had to compromise a little on the proportional value concept, and even then the average game was about 17. Hitting is just like that; you have a couple of guys who carry the team and have scores of 40 or 50, and then you have a bunch of guys at ten and a couple of guys who are in single digits—and the guys who score at 40 or 50 aren’t really having BIG days, not career days.
I decided I could live with a zerotoahundred scale in which the average was 30, so I sacrificed the concept of proportional representation of value, and I got everybody between zero and 100 with an average of about 25, but that system tended to pick the wrong games as the best games, the VERY best games. Eventually I was driven to the question: OK, why don’t I just evaluate the hitter’s game logs by the player’s runs created in each game? The logical question that I should have gotten to a long time ago.
So how do you measure a hitter’s runs created in a single game? The form of all runs created methods is A * B / C, where A is times on base, B is "advancement" or total bases, basically, and C is opportunity or plate appearances. But multiplying and dividing very small numbers leads to screwy results, as most of you probably know, so the challenge is to make a system that, measuring runs created game by game, matches as nearly as possible the runs created estimates for a season or a career.
The first thing we have to do is to create a "neutral context". It is like the ballast in a ship, or the clay in a stick of dynamite. Ballast in a ship is dead weight below the water line, which keeps the ship from pitching left and right and possibly turning over in rough water (although you need ballast even in calm water, since you always have wave action.) Dynamite as best I understand it is nitroglycerine soaked in clay. The problem with nitroglycerine is that it is hypersensitive, so it will explode when you would much rather it didn’t. Dynamite puts the nitroglycerine in an inert substance so that it doesn’t explode until it is heated up by fire. Same concept; we are placing the player’s individual batting accomplishments in a stable context, so that the ship doesn’t turn over and the data doesn’t explode.
Suppose that, in a game, a team had 12 runners on base, 12 total bases, and 36 plate appearances. 12 times 12 divided by 36 is 4; that’s four runs created. That represents the "other eight" hitters in the lineup, the "rest of the team". Teams score about 4.50 runs per game, over time, with nine hitters, which is half of a run per hitter per game. The "focus hitter" represents the other half a run, the difference between 4.00 and 4.50.
Suppose that the hitter goes 0for4; that makes 12 runners on base, 12 total bases, but 40 plate appearances, so that’s 12 times 12 divided by 40, which is 3.60. The hitter’s runs created in the game would be negative .40, the difference between 4.00 and 3.60.
But suppose that he goes 3for4 with a home run; that makes 15 runners on base, 18 total bases in 40 plate appearances; that’s 15 times 18 divided by 40, which is 6.50. The hitter has created 2.50 runs with his homer and two singles, one out; 6.50 minus 4.00.
These are all numbers that work in general terms; actual teams do have about 12 runners on base and 12 total bases and about 36 plate appearances from their "other eight" hitters, in a typical game; not exactly, but it works as dead weight. A homer has a run value of about 1.65 runs, a single about .70 runs each, an out has a certain cost; it totals up to a number not too far off from 2.50. I am describing the direction of march here, not the destination; we’ll arrive at different numbers, but this is how we start out.
So we say as a starting point that the A factor is (12 + H + W + HBP), twelve plus the hitter’s hits in the game, plus his walks, plus his hit batsmen.
The B factor as a starting point is (12 + TB), twelve plus the total bases for this hitter.
The C factor as a starting point is (36 + PA), thirtysix plus plate appearances.
Getting technical now, we modify the "A" factor so that it becomes
12 + H + W + HBP – CS – GDP – (IBB/3)
We take caught stealing and double play balls out of the times on base, and also we take away a few of the intentional walks because intentional walks almost all occur with outs already recorded in the inning, so a hitter who gets an intentional pass is somewhat less likely to score a run than a hitter who gets an undescribed walk, which may occur with no one out.
The B factor starts out as 12 + Total Bases, but we need to adjust for (a) Sacrifice Hits and Sacrifice Flies, which advance runners, (b) Stolen Bases, which advance runners, (c) the fact that walks and hit batsmen, while they are not primarily "advancement" mechanisms in an offense, will also sometimes advance baserunners, and (d) the fact that strikeouts do NOT advance runners. Adjusting for all of that, we change the simple formula element 12 + TB into this big hairy formula element:
12 + TB + (BB – IBB + HBP) * .3 + (SH + SF + 1.6 * SB) * .4  .07 * SO
That’s the "B" factor, and we’re not done yet; we’ll modify it one more time before we’re done. The "C" factor, mercifully, remains the same—36 + AB + BB + HBP + SH + SF, or simply 36 + PA.
And then you multiply A * B, divide by C and subtract 4, crucial step. At the end of the process we remove the inert material to measure the batter’s individual contribution.
One more adjustment. The formula tends to overestimate runs created by something like 7/10^{th} of one percent. The reason it does this is that the "dead weight" that we but into the formula was square—12 men on base, 12 total bases. In reality, offenses are not "square" in that sense; there are more total bases than men on base. The most efficient offense, in a certain sense, would be a square offense, but offenses in general are not square. We can adjust for that by changing the 12 times 12 to 11 times 13, 11 men on base plus the hitter, 13 advancement bases plus the hitter. 12 times 12 is 144; 11 times 13 is 143, so making that adjustment lowers the resulting runs created estimates by some tiny amount, and makes the estimates more accurate, I believe. I will explain in a minute why I believe they are more accurate.
OK, ONE little last step. We don’t actually use ELEVEN "dead weight" baserunners; we actually use 11.035. It’s just a tiny little thing, but if you measure it on a large enough scale it makes a difference. So the final formula for a hitter’s runs created in a game is:
A * B / C, minus 4.00, where
A is 11.035 + H + W + HBP – CS – GDP – (IBB/3)
B is 13 + TB + (BB – IBB + HBP) * .3 + (SH + SF + 1.6 * SB) * .4  .07 * SO, and
C is 36 + AB + W + HBP + SH + SF, or simply 36 + PA.
OK, two questions now:
(1) How do we know this works, and
(2) What good is it?
1) How do we know it works?
I was trying to "match" the runs created estimates from other sources—specifically, the AllTime Handbook, which is a book we published 25 years ago which has good runs created methods for every player up to that time. (You’ve probably never seen the book; it didn’t sell a lot of copies.) Anyway, I have game logs for a few hitters. . . whole careers for about 3540 hitters; it comes to 60,000 games and 478 player/season. I compared the runs created for each player/season to the sum of the individual game runs created. For example:
First

Last Name

Year

New Method

Old Method

Zoilo

Versalles

1969

25

26

Cesar

Tovar

1969

82

81

Maury

Wills

1960

63

63

Dick

Allen

1976

49

48

Johnny

Callison

1965

106

100

Jim

Gilliam

1962

84

88

Dick

Allen

1967

102

104

Jim Ray

Hart

1966

95

97

Bill

White

1958

5

5

Rico

Petrocelli

1974

65

64

Matty

Alou

1971

91

89

Al

Spangler

1959

3

3

Tony

Oliva

1963

1

1

Al

Kaline

1972

46

48

Dick

Groat

1958

72

70

Jim

Gilliam

1965

59

60

Dale

Mitchell

1946

9

8

Dick

Allen

1975

52

48

Jim

Gilliam

1956

104

101

Orlando

Cepeda

1971

36

34

Bob

Allison

1959

88

89

Ken

McMullen

1967

62

67

Al

Kaline

1968

58

60

Paul

Blair

1966

37

39

Matty

Alou

1964

22

23

Norm

Cash

1959

19

17

Andre

Thornton

1975

84

86

Andre

Thornton

1973

4

4

Cesar

Tovar

1975

44

44

Ken

McMullen

1975

7

8

Paul

Blair

1964

1

0

Matty

Alou

1970

81

77

Dick

Stuart

1958

42

40

Dick

Groat

1960

87

84

Cesar

Tovar

1974

80

81

Ed

Charles

1967

39

40

George

Scott

1971

70

69

Cecil

Cooper

1974

54

54

Jim

Gilliam

1963

76

81

Bob

Allison

1965

72

76

Tony

Oliva

1968

77

76

Jim Ray

Hart

1965

99

101

Matty

Alou

1967

85

83

Dick

Stuart

1960

65

62

Jim

Northrup

1973

68

65

Cesar

Tovar

1970

105

108

Ed

Charles

1965

64

61

Cookie

Rojas

1974

52

51

Frank

Howard

1958

3

3

Al

Spangler

1962

70

68

The estimates are always about the same, neither higher nor lower, nor, as nearly as I could tell, are they higher or lower for any subgroup of players. I checked many different things to see if the estimates were too high or two low for base stealers or highaverage hitters or lowaverage sluggers or anything, and "corrected" for any problems that I found, then redid the estimates, etc.
In theory, it is bad analytical method evaluate the accuracy of a new method by comparing it to an old estimate. You always try to compare an estimate to a hard fact, rather than a previous estimate. There didn’t seem to be any practical way to do that, and, as a practical matter, I have a lot of confidence in the previous estimate, so I was OK with it.
Another thing I did was to take 20 games (40 team/games) and compare the estimated to actual runs scored by the teams. That was a lot more timeconsuming, because I don’t have data organized to proceed in that way, and also, in individual games you get random deviations due to the formation of clusters of events, but I didn’t see any problems. There was no game in the 40 in which the new estimate was wrong by more than two runs (actually, about 2.2). There was one game where the team had 6.8 runs created and actually scored 9 runs. Of course that will happen, because of fielding and clustering of events, but. . . .didn’t seem like there was a problem there.
I’m not a sophisticated data analyst; I’m just a guy who has been doing this stuff for 50 years. I’m sure a welltrained data scientist could refine the method a little better.
2) What good is it?
In isolation, no value at all. There’s a lot of things you can do with it. You could use it to study whether the runs ACTUALLY resulting from a player’s individual acts are greater or less than the standard accepted estimates. You could use it to study to what extent a pitcher benefitted or suffered from "clustering" of events in his games. You could use it, as was my intention here, to track player’s performance day by day. You could use it to identify each individual hitter’s best game of the season or best game of his career or whatever. You never know what a method is good for until you develop the method. Most of us are usually terrible judges of the value of our ideas.
Thanks for reading.
Bill James