Remember me

Monday Morning Blog

May 3, 2009

1.  Pitch Sequence Modeling

 

            Here’s a problem I have thought about for roughly 12 seconds a day for the last 40 years, but never been able to do anything with.    Suppose that you had an APBA or Strat-o-Matic type game which was based not on at bats, but on pitches.   It must be possible to do this, right?   There must be people who understand the math related to this, or not, but in any case I’m not one of them.

            Suppose that we have a pitcher who faces 1,000 batters in a season, strikes out 162 of them and walks 50.   That would be Whitey Ford, 1965; Ford faced 1,000 batters, struck out 162 and walked 50.   In order to “model” this performance in a pitch-by-pitch simulation, you would have to be able to estimate how many pitches he threw, how many of those were balls, how many were taken for strikes, how many were swung at and missed, etc.   But given his information, it must be possible to estimate those things, right?    There has to be a way to do that, but it’s not easy to see, and I don’t have any idea how to do it.

            OK, this time I’m going to stick with the problem until I figure something out.   We could represent each pitch as a kind of binary problem passing through four gates, leading to five “pitch outcomes”.   The four gates are:

            1)  Each pitch is either taken (a) or swung at (b).

            2)  Each pitch which is taken is either called a strike (a) or a ball (b).

            3)  Each pitch which is swung at is either hit (a) or missed (b).

            4)  Each pitch which is hit is either fair (a) or foul (b).

            What we are trying to do here is to get to a place where we can say, for Whitey Ford in 1965 (or any other pitcher), that of all the pitches he threw, x% were taken and x% were swung at, of those that were taken, x% were called strikes and x% were called balls, of those that were swung at x% were hit and x% were missed, etc.             

            These four gates divide pitches into five outcomes:

 

            Balls  (B)

            Called strikes  (C)

            Swinging strikes  (S)

            Foul Balls  (F)

            Balls in Play (X)

 

            There is actually a sixth outcome, which is “ball hits batter”, which suggests that there is a fifth gate, but let’s not worry about that right now.  Let us suppose that we assume, as a starting point, that each of these is a 50/50 gate. . .of all pitches, 50% are taken and 50% swung at, of those taken, 50% are called balls and 50% are strikes, etc.   We know that’s not true, but it’s a starting point.  If all the percentages were 50/50, for a pitcher, then the pitcher would have:

 

            25% balls

            25% called strikes

            25% swinging strikes

            12.5% foul balls

            12.5% balls in play

 

            Suppose that a pitcher had that mix, and suppose that pitch sequences were random selections from that mix.   How many batters would he strike out, how many would he walk, and how many would put the ball in play?

            Well, can we chase down all the sequences?   For one pitch we would have five options (B, C, S, F and X).   After two pitches we would have 21 possible sequences:

 

            BB

            BC

            BS

            BF

            BX

 

            CB

            CC

            CS

            CF

            CX

 

            SB

            SC

            SS

            SF

            SX

 

            FB

            FC

            FS

            FF

            FX

 

            X

 

            Based on the frequency of each pitch outcome, we could estimate the frequency of each sequence.   Per 1,000,000 plate appearances we would have:

 

BB

62,500

BC

62,500

BS

62,500

BF

31,250

BX

31,250

 

 

CB

62,500

CC

62,500

CS

62,500

CF

31,250

CX

31,250

 

 

SB

62,500

SC

62,500

SS

62,500

SF

31,250

SX

31,250

 

 

FB

31,250

FC

31,250

FS

31,250

FF

15,625

FX

15,625

 

 

X

125,000

 

            As pitch sequences get longer, however, it is apparent that this type of calculation would become unmanageable.   If we added a third pitch, the five sequences above ending in “X” would end there, would not proceed to the third pitch, but the 16 “live sequences” would each divide into five more branches.   This would give us 85 three-pitch sequences to deal with, and we’d just be getting started.   After six pitches, we’d be dealing with thousands of possible sequences.   

            We can simplify that, however, by grouping these into “counts” or “pitch states”.   After one pitch, there are three possible states:  a 1-0 count, an 0-1 count, or a ball put in play.   After two pitches there are only four possible pitch states:   a 2-0 count, a 1-1 count, an 0-2 count, or a ball put in play.  Four is a more manageable number than 21, and we can easily estimate the frequency of each of those four pitch states.   Given the assumptions we’re working with—each gate is 50/50—we would have:

 

            6.25%  counts of 2-0

            31.25% counts of 1-1

            39.06% counts of 0-2

            23.44% balls put in play

 

            After a third pitch we add another possible outcome, the strikeout.   But after three pitches we still have a manageable number of pitch state possibilities, which is six (3-0 count, 2-1 count, 1-2 count, 0-2 count, strikeout, ball put in play).   The percentages for those are all reasonably easy to calculate, since one can only get to a 2-1 count by way of a 1-1 count or a 2-0 count.  The percentages for each of these are:

 

            1.6% counts of 3-0

            11.7% counts of 2-1

            29.3% counts of 1-2

            4.9% counts of 0-2

            19.5% strikeouts

            33.0% balls put in play

 

            After five pitches the number of pitch state possibilities maxes out at seven:

 

            3-2 count

            2-2 count

            1-2 count

            0-2 count

            Walk

            Strikeout

            Ball put in play

 

            And at that point the percentages for these are:

 

3-2 count

6.10%

2-2 count

3.05%

1-2 count

0.76%

0-2 count

0.08%

Walk

1.37%

Strikeout

46.69%

Ball put in play

41.95%

 

            At this point we have only 10% (actually, 9.99%) “live sequences”, and 90% (90.01%) sequence outcomes (walk, strikeout, or ball put in play).   As we add a 7th pitch, an 8th pitch. . .a 20th pitch, the live sequence percentages are driven toward zero, and the sequence outcomes near 100%.   (Sorry there is so much jargon here. . .it’s just the nature of the problem, and I can’t figure out  how to avoid it.)    Anyway, given these 50/50 gate assumptions (gate assumptions?  Really, is that necessary?) . .given these 50/50 gate assumptions, we will eventually reach these sequence outcome percentages:

 

Walk

3.4%

Strikeout

53.1%

Ball put in play

43.5%

 

            We have a strikeout/walk ratio of 16-1 and strikeout rates much higher than Nolan Ryan, but on a certain level our estimates here are not totally unreasonable.   There are pitchers who walk as few as 3% of the batters they face.   We can also calculate the pitches per batter that would result from these assumptions, which is 3.48—not too far off from the actual figure. 

            But what we have accomplished, really, is to create a spreadsheet that calculates the percentages.   I’ll post the spreadsheet along with this article; you can download the spreadsheet and play around with the gate assumptions if you like.

            It is time now to look for actual data.   First gate:  each pitch is either swung at (a) or taken (b), but what are the actual percentages?   I think the actual percentage is that 54% are taken and 46% are swung at, so let’s plug that into our spreadsheet.   Putting in that percentage on that gate, we get the following sequence outcomes:

 

Walk

4.4%

Strikeout

54.4%

Ball put in play

41.2%

Pitches per PA

3.58

 

            Our walk percentage and pitches per plate appearance have moved closer to the true norms, but our strikeout problem is even worse.  Second gate. . .each pitch which is taken is either called a ball (a) or a strike (b).   In fact, 68 to 69% of pitches which are taken are called balls (or are not called strikes), so let’s plug .685 into our spreadsheet, and re-calculate:

 

Walk

12.1%

Strikeout

43.5%

Ball put in play

44.4%

Pitches per PA

3.86

 

            Our strikeout ratio drops to 3.6 to 1 and our pitches per plate appearance are now just about right, but our walk rate is high, our strikeout rate is still unreasonably high, and the ball-in-play percentage is still unreasonably low.  Eric Gagne, with the Dodgers in 2003, struck out 44.8% of the batters he faced, so we are now within the parameters of possibility, but he is the only pitcher in history (50 or more innings, through 2007) with a strikeout rate this high.   Third gate:  each pitch that is swung at is either hit (a) or missed (b).   In fact, 80 to 81% of pitches that are swung at are hit, fair or foul, so let’s plug in that number (.805) for the third gate, and re-calculate again:

 

Walk

10.9%

Strikeout

23.9%

Ball put in play

65.2%

Pitches per PA

3.53

 

            We’re getting closer to the norms, but we’re still off.  Fourth gate:  each pitch that is hit is either hit foul (F) or put in play (X).   The actual percentage is that about 53% of balls hit are hit into fair territory.    Let’s plug that into our spreadsheet, and re-calculate the percentages:

 

Walk

10.4%

Strikeout

22.2%

Ball put in play

67.4%

Pitches per PA

3.43

 

            We’re better, but we still have too many strikeouts, too many walks, and (now) not enough pitches per at bat—and at this point, we’re done.   We’re done with the gate assumptions; now we have to do something else to make this thing work.  

            Our problem now is very easy to diagnose, but will be extremely difficult to solve.  Our problem is that we have assumed that these pitch sequences are random.  In reality, of course, they are not random.   They are non-random in ways that have a definite impact.   A pitcher is not as likely to throw a strike on 1-2 as he is on 3-2.  

            There’s a general problem with the non-randomness, but I would bet that the lion’s share of our problem is contained on only three counts:  0-2, 3-0 and 0-0.   Balls taken on 0-2 are not 31% called strikes; it is probably closer to 15%.  It may be less than that.  Balls taken on 3-0 are not 69% called balls; it’s probably about 5%.    And first pitches, I would think, are atypical in both regards:  that they are probably taken 70-75% of the time, rather than 54%, and, when taken, they are called strikes probably 40-45% of the time, rather than 31-32%.    

            We can “intervene” in our spreadsheet and say that 95% of pitches taken on 3-0 are going to be called strikes, and re-calculate the odds from that, but the problem is that this screws up the other percentages.   We can HOPE that these three “exceptional pitches” will balance out and not screw up the percentages, but we don’t really know.

            Well. . .that’s enough of that for now.   We’ve made a little progress on understanding this issue; we’ve created a framework to think more about it later.   Let’s move on.org.

            But first, Whitey Ford, 1965.   I promised you estimates of his gate percentages; I’m going to get you estimates of his gate percentages, even though we may not be quite ready to get accurate ones.  

            Our pitcher above, facing 1000 batters, would strike out 222 and walk 104, whereas Ford struck out 162 and walked only 50.    It is apparent, then, that his “take percentage” must be lower than we have allowed for, and his “contact percentage” probably higher.  We have been working with a take percentage of 54%; let’s move that to 53%, and see what happens:

 

Walk

9.8%

Strikeout

22.0%

Ball put in play

68.2%

Pitches per PA

3.40

 

            Let’s change the form.   What we care about here is his expected walks—98—and his expected strikeouts—220.   The other stuff, we can live without.   By moving his take percentage from 54% to 53%, we’ve decreased his walks from 104 to 98 and his strikeouts from 222 to 220, so that’s progress, but we still have a long way to go.   Let’s change his “contact percentage” from .805 to .81, and see what we have:

 

            Walks                97

            Strikeouts        217

           

            Still crawling forward.   Let’s change the “take percentage” to 52%, and the contact percentage to .815:

 

            Walks                91

            Strikeouts        212

 

            Still making progress.   We’re assuming that 68.5% of pitches taken by Ford’s batters are called balls.  Let’s make it 67%:

 

            Walks                86

            Strikeouts        220

 

            Hmm.   Going backward on the strikeouts.    We’re assuming that 53% of hit balls against Ford are hit fair.   It seems unlikely that this varies very much by pitcher, but let’s assume it goes up to .532:

 

            Walks                85

            Strikeouts        219

 

            Wider chart. . .these were the initial assumptions and results:

 

Gate 1

Gate 2

Gate 3

Gate 4

Strikeouts

Walks

.54

.685

.805

.53

222

104

 

            And these are what we have now:

 

Gate 1

Gate 2

Gate 3

Gate 4

Strikeouts

Walks

.52

.670

.815

.53

219

85

 

            Let’s inch a little bit closer in that direction, and see what we have.  

 

Gate 1

Gate 2

Gate 3

Gate 4

Strikeouts

Walks

.51

.66

.82

.534

218

76

.50

.66

.83

.536

211

71

.49

.66

.83

.538

204

65

.49

.66

.85

.538

193

65

.48

.66

.86

.540

188

59

.47

.66

.86

.545

179

54

.47

.66

.87

.545

177

54

.47

.66

.87

.545

175

55

.47

.67

.87

.545

172

56

.46

.67

.87

.550

164

51

.46

.67

.87

.552

162

50

 

            There we are.   If we assume that, for Whitey Ford in 1965:

 

            45.7% of pitches were taken, 54.3% swung at

            66.5% of pitches taken were called balls, 33.5% called strikes

            87.15% of pitches swung at resulted in contact

            55.2% of balls hit were put into play (hit fair)

 

            Then the expected results would be 162 strikeouts, 50 walks per 1000 batters faced.  

           We have AN estimate.   Later, we’ll work on getting a better estimate. 

 

           If you'd like to view a spreadsheet of these results, you can download it via this link.

 

 

2.  The Right Place for an RBI Man

 

            How often does the player who leads the league in RBI do so simply because he has the most at bats with runners in scoring position?   And, extending the issue. . ..since we know the league leader in RBI is often the MVP, how often does it happen that somebody wins the MVP Award, in essence, simply because he has more RBI opportunities than anybody else?

 

            Due to the miracle of Retrosheet we can actually study this now.   Retrosheet has play by play, occasionally missing a plate appearance, for both leagues back to 1954 and for the National League in 1953.   That gives us 111 leagues to look at—both leagues 1954-2008, plus 1953 in the National.  

            In 2004 the league leaders in RBI were Miguel Tejada (150) and Vinny Castilla (131).  The league leaders in at bats with runners in scoring position were Miguel Tejada (208) and Vinny Castilla (203).   This also happened in 1965.  In 1965 the league leaders in RBI were Deron Johnson (130) and Rocky Colavito (108).   The league leaders in at bats with runners in scoring position were Deron Johnson (180) and Rocky Colavito (171).  

            The real answer, though, is that it happens less often than I would have guessed.   Since 1953 there have been 22 players who led the league both in RBI and at bats with runners in scoring position.   Thus, 80% of the time, the player who leads the league in at bats with runners in scoring position does not lead the league in RBI.

            In fact, over half the time the player who leads the league in RBI does not finish among the top four in the league in at bats, runners in scoring position.   I count 50 times since 1953 that the league leader in RBI has been among the top four in at bats with runners in scoring position, but 61 times that he has not, or 45%.  This is a little bit artificially high because of ties. . .ties in who leads the league in RBI, ties in who leads the league in at bats, runners in scoring position.  Occasionally two players tie for the league lead in RBI, and if either of them was in the top four in the league in RBI, I counted it as a hit.   I would have assumed that if two players tied for the league lead in RBI one of them would have to be among the league leaders in the other category, but not so.  In 1995, for example, Albert Belle and Mo Vaughn (founder of MoVaughn.org) tied for the American League lead in RBI with 126, although neither of them was among the league leaders in at bats with runners in scoring position. 

            And as to the other thing. . .winning the MVP Award because you have a huge number of at bats with runners in scoring position. ..that basically never happens.   There is only time since 1953 that you can make an argument that that has happened.   There are only six times since 1953 that the player who led the league in at bats with runners in scoring position has won the MVP Award, and five of them are pretty legit.  

Henry Aaron led the league in at bats with runners in scoring position in 1957 (156), but he hit .322 with 44 homers, 132 RBI, and it’s really hard to argue with his MVP selection. 

Roberto Clemente led the NL in at bats with runners in scoring position in 1966 (163), and this did help him win the MVP Award in one of the rare seasons that he didn’t lead the league in hitting, but again, it’s hard to argue that the RBI opportunities made him the MVP.   I probably wouldn’t vote for Clemente as the MVP, but his at bats with runners in scoring position weren’t that large (163), and, given his defensive skills and the fact that two or three of the nation’s leading sportswriters had been lobbying to get him an MVP Award for several years before then, I don’t think you can draw a strong line between the two.

Joe Torre led the NL in at bats with runners in scoring position in 1971 (190) and did win the MVP Award, but then, Torre hit .363 and had 230 hits, so it’s not really a questionable MVP Award, although Willie Stargell also had a great year.

George Foster led the NL in at bats with runners in scoring position in 1977 (197), and did win the MVP, but again, appears to be the legitimate MVP.   He hit .320, and his 52 homers and 149 RBI were both the highest totals in the major leagues between 1966 and 1995.    He would obviously have led the league in RBI even had he not led in at bats with Runners in Scoring Position. 

A-Rod led the American League in at bats with Runners in Scoring Position in 2005 (186), and did win the MVP Award, but again, it’s not an especially questionable selection.  And it’s impossible to argue that leading the league in at bats with runners in scoring position that one year manufactured A-Rod’s aura.

The one player who probably did get an MVP Award in large measure because of an unusual number of at bats with runners in scoring position was Don Baylor in 1979.   Baylor led the league in at bats with runners in scoring position with one of the highest totals ever, 215, and drove in 139 runs in part because of that.  Baylor was a player who was well-liked by the press and was perceived as a leader and a winner, even before the MVP Award, but he was also a halftime outfielder/halftime DH.   He couldn’t throw, and he wasn’t a great outfielder when he could throw, and he wasn’t the best hitter in the league even in 1979, which was his best year.  It is unlikely that he would have been an MVP contender had he not had an unusual number of RBI opportunities.

Here’s a story about him, though, the story told in anecdotal form without references, so dock me three points for accuracy.  When Baylor was a rookie in 1972 he was struggling, and, on the advice of a coach, he cut down his stroke to try to make better contact.   Earl Weaver immediately pulled him aside and told him not to do that.   “You’re going to be the American League MVP in 1978,” Weaver told him, “and I don’t want you up there slapping at the ball.”  Weaver missed his MVP season by one year.

Anyway, other than Aaron, Clemente, Torre, Foster, A-Rod and Baylor, no one has ever led the league in at bats with runners in scoring position and also won the MVP Award.   I’m surprised that there isn’t more of a connection.

 

 

            If my notes are correct, the highest number of at bats with runners in scoring position on record is 218, by Derek Bell in 1996.  Bell hit .263 with 17 homers, 113 RBI, but still missed leading the league in RBI by 37.  

            Not a lot of players have gotten to hit 200 times in a season with runners in scoring position.   In the first 20 years of the data, 1953-1973, the only player to bat 200 times in the situation was Tommy Davis in 1962, the year Maury Wills stole 104 bases and Davis hit .346 with 27 homers, 153 RBI.    Davis had 213.    Since 1974 it has been more common, but still, no one has batted 200 times with runners in scoring position since 2004.    Castilla, Tejada and Miguel Cabrera were all over 200 in 2004, though.   I believe Baylor’s 215 is the highest known total in the American League.

            Frank Malzone had fantastic numbers of at bats with runners in scoring position the first half of his career.   Malzone led the majors in at bats with runners in scoring position in 1957, his first year as a regular, with 176, and also hit .347 with runners in scoring position.   He led the majors again in 1958, with 181—he led the majors by a whopping 30—and he again hit .320 with runners in scoring position.   He led the American league in at bats with runners in scoring position in 1959 (170), led the majors again in 1960 (172), and finished second in the American League in 1961 (157), and third in 1962 (165).  

            He stopped hitting notably well with runners in scoring position after the first two years, though, (.347, .320, .253, .250, .293, .291, .293, career average of .290).  The question occurs of why Frank Malzone, among all players in history, would have such remarkable at bat totals with runners in scoring position. 

            Part of it is doubles.   The Red Sox in that era almost always led the league in doubles.  But I wonder if, after Malzone hit so well with runners in scoring position in 1957, his first year, the Red Sox didn’t start bunting to get him to the plate with men in scoring position, perhaps actually believing that he became a better hitter in RBI situations?

            The NL equivalent to Malzone was Kenny Boyer.   Boyer was a similar player to Malzone, except that he had a little more power (and was also a step quicker as a young player, but that may be just that he got to the majors at a younger age.)  Anyway, similar players, Boyer was better.   

            Boyer led the majors in at bats, runners in scoring position in 1956 (173), led the National League in 1958 (151), finished second in the National League in 1959 (163), tied for second in 1961 (160), fourth in 1962 (169), led again in 1963 (181), and missed by one of leading the league in 1964, his MVP year, when he had a career-high 193 at bats with runners in scoring position, and a career-high 119 RBI. 

            Brooks Robinson, another similar player, also has very high totals.  He led the American League in 1962 (178), finished third in 1964 (165), led again in 1971 (172), and is near the league lead in some other seasons. 

            In general, however, there is more randomness than I would have expected in who finishes at (and near) the top of the league in at bats, runners in scoring position.   Obviously there are predictable elements.   You need to be a middle-of-the-order hitter who stays healthy to lead the league in this category, and it helps if you bat behind a base stealer.   If you are TOO good a hitter—Mickey Mantle or Barry Bonds—you may lead the league in plate appearances with runners in scoring position, but not at bats.  But a lot of it doesn’t make intuitive sense, and appears to be just random.  Quite a few players from not-very-good teams with not-very-good offenses have led the league in at bats with runners in scoring position.   Bill Mazeroski led the National League in 1967.  Bob Oliver of the expansion Royals led the American League in 1970—and led again in 1972, when he was playing for the Angels.   Butch Wynegar led his league in 1977, Enos Cabell in 1979, Willie Upshaw in 1984, Keith Moreland and Julio Franco in 1985, Nick Esasky in 1989.  Ron Gant and Benito Santiago tied for the National League lead in 1991. 

Todd Zeile led his league in 1993, J. T. Snow in 1995, Derrek Bell set the all-time record in 1996.  Players from the Kansas City Royals led the American League in 1998 and 2000 (different players.)   Preston Wilson led the National League in 2000 and 2003; Ryan Zimmerman of the Nationals led the NL in 2006.   It’s not always the people you would expect. 

            Don Mattingly, although I have tended to diminish his RBI counts as being created by hitting behind Rickey Henderson, actually never led his league in at bats with runners in scoring position.  Steve Garvey led only once, which ties him with Bobby Tolan; Tolan led the National League in 1972.   He hit .283 with 8 homers that year, and hit .236 with runners in scoring position—47 for 199. 

            In 1957, when he first led the American League in this category, Frank Malzone batted all over the lineup—15 games batting leadoff, 26 games batting second, 31 batting cleanup, 15 batting fifth, 65 batting sixth, and one batting seventh.   His high at bats total with runners in scoring position can’t be a function of his position in the batting order, because he didn’t really have one.

            On the other side of this issue is Willie Mays.   Mays had 84 at bats with runners in scoring position in 1956, and 79 (!!) in 1957.   How do you construct your lineup so badly that you’re getting Willie Mays 80 at bats a year with runners in scoring position?    In 1956 Johnny Temple was a leadoff man who batted 436 times with the bases empty—but still had 117 at bats with runners in scoring position.   He had 113 more in 1957.

            Who would have guessed that, in 1956 and 1957, Johnny Temple was batting far more times with runners in scoring position than Willie Mays was?   And it’s not walks. . .even if you added in his walks with runners in scoring position (for Mays), he still wouldn’t catch Temple. 

            This may be a function of the fact that Temple was batting leadoff, and the pitchers usually bunted when there was a man on first.  Mays never led the league in at bats with runners in scoring position, nor did Mantle.  For his career as much as has been documented by Retrosheet, 23.4% of Mays’ plate appearances were with Runners in Scoring Position, 24.4% for Mantle, as opposed to 27.5% for Malzone, 28.1% for Ken Boyer, and 27.7% for Brooks Robinson.  

 

 

            About three years ago. . .I probably shouldn’t do things like this from memory, but I am unable to locate my research, so what the hell.   Three or four years ago, when SABR was providing searchable access to millions of old newspaper files, I tried to work out the origin of the phrase “in scoring position”, which somebody had suggested was a 1970s phrase.   I knew that wasn’t right because I remember it from when I became a baseball fan.  

            I concluded that the phrase was originated or popularized by a Chicago sportswriter about 100 years ago; I believe it was Warren Brown, although I couldn’t swear to that.   If the first 25 instances in which I could find this phrase being used, this one sportswriter was using it—and in different publications—about 15 to 20 times.  

            But in the first 40 years of its use, the expression was used with almost equal frequency in different sports.  I found hundreds of examples of the phrase “in scoring position” being used in reporting on basketball, football, hockey, tennis, golf, bowling, wrestling, automobile racing and—with surprising frequency—polo.  I found it used in basically every sport, including those which don’t exactly “score”, like foot races.  

            After World War II—about 1948 to 1958—the phrase became a baseball expression.   I suspect that it settled into baseball because, in baseball, the phrase had a definite and specific meaning, whereas in the other sports its use was somewhat ill-defined.   When exactly is a basketball player “in scoring position”?   One can’t say, or one can say when he is in scoring position, but not necessarily when he isn’t.   I think this caused the expression to be gradually taken over by baseball.

 

  

 

3.  The Home Field Advantage

Of West Coast Teams

 

            What we will call the Home Field Advantage here should more properly be called the Home Field Differential.   “Differential” is a rather awkward word, so I will use “advantage”, but if a team is dis-advantaged on the road, that will also show up as a home field “advantage”.  I am trying to take note of the fact that this “advantage” is not a true advantage; it could equally well be that we are measuring a disadvantage and calling it an advantage.

            Another thing I have wondered about for a long time is whether the Home Field Advantage is larger for West Coast teams than for other teams.   

            The short but definitive answer is that it is not.   I figured the home field advantage for each major league team over the ten years 1999-2008.   The largest home field “edge” in those ten years has been for the Rockies, 129.5 games, followed by Tampa Bay, 103.5 games.   The smallest has been for the Cincinnati Reds—22.5 games.   These are the totals for each team:

 

Colorado

129.5

Tampa Bay

103.5

Texas

95.0

Pittsburgh

82.0

Oakland

81.0

Florida

80.0

 

 

Mon-Wash

77.5

Minnesota

76.0

White Sox

71.0

San Francisco

70.5

 

 

Milwaukee

69.0

St. Louis

69.0

Houston

68.5

Seattle

67.0

Boston

63.5

Arizona

63.0

 

 

Toronto

58.5

Atlanta

57.0

Yankees

55.0

San Diego

54.5

Angels

54.0

Kansas City

53.5

Detroit

52.0

 

 

Mets

48.5

Dodgers

48.0

Cubbie

43.5

Baltimore

40.0

Cleveland

40.0

 

 

Philadelphia

36.5

Cincinnati

22.5

 

            The Reds are +37 over the last five years, but were -14.5 in the five years before that.  The Red Sox were -6.5 over the years 1999-2002, but have been +70 since 2003. 

            It seems obvious that the “hot weather teams”—the two Florida teams and the Rangers—have larger home field effects because both they and their opponents are making larger-than-normal adjustments to the weather.  I’m not talking about the game-time temperatures, but the normal walking-around temperatures.   If you spend any time in Florida, you know that the heat down there really zaps you for about ten days after you get there.   After awhile your blood thins out, and you don’t feel it so much. 

            You can make of the data whatever you will, but what you can’t make of it is that West Coast teams have larger home field effects than other teams.    They clearly don’t.  The West Coast teams are spread out in the middle of the chart:

 

5.

Oakland

81.0

10.

San Francisco

70.5

14.

Seattle

67.0

20.

San Diego

54.5

22.

Angels

54.0

25.

Dodgers

48.0

           

 

4.  Vada Pinson vs. Orlando Cepeda

 

            I play in a Ballpark league in which we are currently playing based on the National League in 1957.   Orlando Cepeda comes into the league next year, and the guy who has the #1 draft pick insists that he is not taking Cepeda; he’s going for Vada Pinson.  In part this has to do with design issues in Ballpark baseball, but I got to wondering who was a better player in real life:  Cepeda, or Pinson?

            Vada Pinson was more or less the Grady Sizemore of the early 1960s, while Cepeda was something like a right-handed Lance Berkmann.   Cepeda is a Hall of Famer but a marginal one; Pinson is not in the Hall of Fame but could be.   Pinson had 2,757 hits—400 more than Cepeda—hit 256 homers, stole 305 bases and was a fine defensive center fielder.   Cepeda hit 379 homers and averaged .297—numbers similar to Jim Rice—and also ran OK as a young player, stealing 142 bases, but was not a Gold Glove first baseman, and was a brutal outfielder when called upon to attempt the outfield.   Let’s run Win Shares and Loss Shares for them.

            In 1958 Cepeda was the National League Rookie of the Year.   The 19-year-old Pinson, although starting the season in the majors, hit .194 into early May, was sent out, and didn’t return until mid-September. 

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1958

Cepeda

20

148

603

25

96

.312

.512

.342

.854

18

14

.571

1958

Pinson

19

27

96

1

8

.271

.375

.352

.727

3

3

.510

 

            Pinson hit .412 after his September recall, lifting his average 77 points, and cementing his job at the start of the 1959 season.  

So Cepeda starts out a little bit ahead.   In 1959 both Pinson and Cepeda had outstanding seasons; in 1960 they both had somewhat disappointing seasons, in that their batting averages both slipped under .300 and their RBI counts dropped, although both remained extremely valuable players:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1959

Cepeda

21

151

605

27

105

.317

.522

.355

.878

22

9

.699

1959

Pinson

20

154

648

20

84

.316

.509

.371

.880

23

11

.672

1960

Cepeda

22

151

569

24

96

.297

.497

.343

.840

21

9

.696

1960

Pinson

21

154

652

20

61

.287

.472

.339

.811

22

10

.693

 

            There’s nothing really to choose from at this point.   In 1961 both players were serious MVP candidates.   Cepeda led the league in home runs and RBI, and finished second in the MVP voting; Pinson hit .343 and finished third, and their vote counts were close.   As we see it, Pinson was actually the more valuable player:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1961

Cepeda

23

152

585

46

142

.311

.609

.362

.970

23

8

.754

1961

Pinson

22

154

607

16

87

.343

.504

.379

.883

25

7

.777

 

            At this point Cepeda has a career won-lost record of 85-40 (.679), while Pinson is 72-30 (.702).    In 1962 both players again had “down” seasons, although, being the players they were at that time, really good down seasons.   They were both championship-quality players.

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1962

Cepeda

24

162

625

35

114

.306

.518

.347

.865

22

13

.637

1962

Pinson

23

155

619

23

100

.292

.477

.341

.817

21

12

.637

 

            Their .637 winning percentages were the lowest for both players since 1958, but both still drove in 100 runs.   In 1963 both bounced back with outstanding seasons:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1963

Cepeda

25

156

579

34

97

.316

.563

.366

.929

26

5

.837

1963

Pinson

24

162

652

22

106

.313

.514

.347

.861

26

9

.741

 

            Up to this point in his career Cepeda has generally done a little better in MVP voting, although neither has made a serious run at the award other than in 1961.   In 1963, however, Pinson finished tenth in the MVP voting—which is not very high, considering his season—but Cepeda, despite having what we regard as the best season of his career to this point, was shut out, not mentioned in the voting.   There’s a difference between how we evaluate Cepeda’s seasons, and how the MVP voters evaluated them.   The voters probably marked him down because he failed to drive in 100 runs. 

            He failed to drive in 100 runs because his at bats with men on base dropped from 300 to 249 and his at bats with runners in scoring position dropped from 174 to 128.   He hit .336 with runners in scoring position, but it was just a year that he didn’t have a lot of ribbies out there waiting for him.   As we look at the season, we mark it as his best season because he had what was then a career-high .366 on-base percentage, while the offensive context on the Giants changed sharply.   In 1962 the National League ERA was 3.94, and the Giants’ park run index was 100.   In 1963 the NL ERA was 3.29, and the park run index was 91.   In our view, Cepeda’s 1963 season was the best he had had up to that point, and it’s hard to believe that there were 24 better players in the league.

            In any case Cepeda’s career won-lost record, at this point, was 133-58 (.697), while Pinson’s was 119-52 (.698).   They were still playing at essentially the same level.  In 1964 both were down a little, but Pinson was down more seriously:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1964

Cepeda

26

142

529

31

97

.304

.539

.361

.900

20

8

.711

1964

Pinson

25

156

625

23

84

.266

.448

.316

.764

21

14

.601

 

            This put Cepeda ahead, but in 1965 Cepeda was injured, while Pinson had good season:

 

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1965

Cepeda

27

33

34

1

5

.176

.294

.225

.519

0

2

.128

1965

Pinson

26

159

669

22

94

.305

.484

.352

.836

23

12

.652

 

            This gave Pinson back the year advantage that Cepeda had had in 1958, but this time it was Pinson who was stiffed in the MVP voting.  Despite getting 204 hits, despite being second in the league in hits and third in total bases, despite playing a pretty good center field for a contending team, Pinson was not among the 31 players mentioned in the NL MVP voting.  

            Two years ago Cepeda was a little ahead in career Win Shares, but Pinson ahead in percentage.   Now Pinson is ahead in career Win Shares (162 to 153), but Cepeda is ahead in the winning percentage (.693 to .677). 

            Pinson, however, had had his last really good year.  Cepeda was never consistent after his injury year, but he had a couple of good years left.   Early in 1966 Cepeda was traded from San Francisco to St. Louis.   The first line below is his record with San Francisco; the second line is his record with St. Louis:

  

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1966

Cepeda

28

19

49

3

15

.286

.510

.352

.862

2

1

.679

1966

Cepeda

28

123

452

17

58

.303

.469

.362

.831

16

9

.638

1966

Pinson

27

156

618

16

76

.288

.442

.326

.768

16

17

.487

 

            Vada Pinson in 1967 led the National League in triples (13), stole 26 bases and hit a respectable .288 in a league where runs were scarce.   Cepeda, however, won—and probably deserved--the NL MVP Award:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1967

Cepeda

29

151

563

25

111

.325

.524

.399

.923

27

4

.882

1967

Pinson

28

158

650

18

66

.288

.454

.318

.771

22

13

.623

 

            Cepeda was now 198-82 in his career (.709), while Pinson was 200-108 (.651).  Both players were still short of 30, and both appeared to be solidly on a Hall of Fame path.   Both, however, had disappointing seasons in 1968:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1968

Cepeda

30

157

600

16

73

.248

.378

.306

.685

19

16

.544

1968

Pinson

29

130

499

5

48

.271

.383

.311

.694

13

15

.469

 

            You can alibi about the context; we all know that 1968 was the year of the pitcher.  The fact is that neither man was really very good.   This was the second time in three years that Pinson had been a sub-.500 player, and unfortunately he was to make a habit of it.

 

            The Cardinals re-built their team after the 1968 season—for no apparent reason, and unwisely even if one assumes that there was a reason.   It remains a puzzle:  what exactly were they thinking of?    In any case the Cardinals moved Cepeda out—and Pinson in.   Pinson, who had been with the Reds for more than ten years, was traded to the Cardinals for Wayne Granger and Bobby Tolan, our second Bobby Tolan reference of the Blog.   Cepeda was dealt to Atlanta for Joe Torre.  

            In 1969 baseball lowered the mounds and moved in a lot of fences in a concerted effort to help the hitters.   The hitting numbers did spike up in 1969, but, in real terms, both Pinson and Cepeda played about as well in 1969 as they had in 1968.   Both players did, however, have better seasons in 1970:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1969

Cepeda

31

154

573

22

88

.257

.428

.325

.753

18

15

.539

1969

Pinson

30

132

495

10

70

.255

.384

.303

.686

14

15

.489

1970

Cepeda

32

148

567

34

111

.305

.543

.365

.908

19

11

.620

1970

Pinson

31

148

574

24

82

.286

.481

.319

.800

15

14

.516

 

            Better seasons, but neither man was the player he had once been.  Superficially, Cepeda’s 1970 season (.305, 34 homers, 111 RBI) is similar to his 1963 season (.316, 34 homers, 97 RBI.)   If you look at the offensive context, though, it’s not the same—and neither player was building his Hall of Fame credentials the way he should have been, considering his age. 

            Still, this was the fifth consecutive season that Cepeda had out-played Pinson, and he had pulled gradually ahead.   Cepeda’s career won-lost record was now 253-124 (.672); Pinson’s was 243-151 (.616).   Cepeda had moved ahead by 23½ games. 

            In 1971 both men were sub-.500 players, Cepeda playing better for the sixth consecutive season.  In early 1972 Cepeda was injured, missed most of the season, and was traded by the Braves to Oakland in exchange for Denny McLain, whose career was in a warp-speed nosedive.   Cepeda batted just three times for Oakland, and was released by them after the season:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1971

Cepeda

33

71

250

14

44

.276

.492

.330

.822

7

8

.460

1971

Pinson

32

146

566

11

35

.263

.376

.295

.672

13

17

.426

1972

Cepeda

34

28

84

4

9

.298

.476

.352

.828

3

1

.672

1972

Cepeda

34

3

3

0

0

.000

.000

.000

.000

0

0

.000

1972

Pinson

33

136

484

7

49

.275

.376

.321

.697

17

10

.647

 

            Vada Pinson’s 1972 season demands explanation.   What doesn’t get mentioned a lot is that the American League batting average in 1972 was four points lower than it had been in 1968.  The ERA was higher, but barely—2.99 in 1968, 3.06 in 1972.   Rod Carew won the 1972 batting title at .318—and with zero home runs. 

            This was what triggered the DH Rule; again, I think a lot of younger people probably don’t understand that.  Baseball in the mid-1960s had 1) very little offense, and 2) stagnant attendance.    Associating the two, baseball made a deliberate effort to increase offense in 1969.   This worked, for a while, but by 1972 the American League was back to where it had been in 1968, while the National League seemed to be drifting gradually back in that direction.   The American League adopted the DH Rule entirely to increase offense—which seems laughable in retrospect, since we still have the DH Rule now, when offense is so abundant.  

            Anyway, the American League in 1972 is like 1968; it’s a hitter’s desert.  Vada Pinson’s .275 batting average in 1972 was 36 points above the league average, and was the 15th-best average in the league.  In addition, Pinson was playing now in a park (the Big A in Anaheim) that had a park run index of 75, which was not only the lowest in the majors, but the lowest in the majors in years.   Every run was huge.   These things cause Pinson’s 1972 season to score at 17-10, even though the numbers are superficially unimpressive. 

            Cepeda hooked on as the Red Sox’ regular DH in 1973 and had an OK season.  Pinson’s numbers went down while the context numbers went up:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1973

Cepeda

35

142

550

20

86

.289

.444

.350

.793

13

14

.496

1973

Pinson

34

124

466

8

57

.260

.367

.286

.653

10

16

.389

 

            Cepeda, by now unable to run at all, nonetheless was released in spring training, 1974.  He was signed by the Kansas City Royals.  Pinson, on the other hand, was traded to the Royals in exchange for Barry Raziano and cash, and so Pinson and Cepeda were teammates for part of 1974 for the only time in their careers.  Pinson hit .276, and thus lasted into the 1975 season, while Cepeda’s career came to an end:

 

YEAR

City

Age

G

AB

HR

RBI

AVG

SLG

OBA

OPS

WS

LS

W Pct

1974

Cepeda

36

33

107

1

18

.215

.290

.282

.572

1

5

.175

1974

Pinson

35

115

406

6

41

.276

.374

.312

.686

11

13

.458

1975

Pinson

36

103

319

4

22

.223

.335

.248

.583

5

14

.281

 

            That was the end of the line for Pinson.    Cepeda’s final career record was 277-152 (.646), while Pinson’s was 299-221 (.575). 

            We have to conclude, then, that Cepeda was in fact a better player than Pinson, and that, if one or the other was to go into the Hall of Fame, the right man was selected.  

Breaking their records down into offense and defense:

 

            Cepeda on offense        237-  93   .717

            Cepeda on defense         40-  58   .408

           

            Pinson on offense          234-171   .578

            Pinson on defense           65-  50   .567

 

            Cepeda was 40½ Win Shares better at bat, while Pinson was 16½  Win Shares better in the field.  

 
 

COMMENTS (9 Comments, most recent shown first)

Chihuahua332
I was very interested to read your Pitch Sequence Modeling article. One thing about this analysis is that the logic is built from the hitter's perspective, not the pitcher's.

The first gate in your model is whether the batter swings or doesn't swing with the second gate being whether a taken pitch is a ball or a strike. That is the view from the batter's side of the world. From the pitcher's perspective it's the reverse. The first gate is whether or not they throw a strike or a ball (I realize that is highly simplified) and the second gate is whether or not the batter swings at the pitch.

Using the model from the pitcher's perspective better matches the sequence of the pitch. First, the pitcher throws the ball which is going to be either a ball or a strike. Second, the batter decides whether to swing or to take the pitch. Third, there is either contact or no contact and finally a hit ball is either fair or foul.

Obviously the data for this model is more difficult to compile because it is dependent on having accurate pitch location data for every pitch but I think that it would give a more accurate picture.

Whatcha think?
8:08 AM Jun 12th
 
ventboys
It seems weird to me that hot weather teams have larger than normal home field advantages. The conventional wisdom was that hot weather teams suffered from those conditions, as their players wilted in the constant summer heat. I remember reading that the Rangers' players, in particular, wilted in the summer heat in the early 1990's.

It's always comforting to see that we still don't really know so many things about this game. Arguments are only fun when the answer is in doubt.
12:58 AM May 8th
 
jrickert
My guess is that some sort of Markov chain will be needed to make the pitch sequencing reasonable accurate - for example, different gate values at different counts.
Taking a look at the gate values for different counts in 2008 gave me

cnt gate1 gate2 gate3 gate4
0-0 .722 .569 .805 .517
1-0 .586 .596 .819 .528
0-1 .535 .792 .788 .516
2-0 .593 .536 .849 .536
1-1 .472 .762 .810 .520
0-2 .508 .926 .765 .493
3-0 .928 .383 .894 .496
2-1 .409 .722 .837 .521
1-2 .420 .895 .792 .508
3-1 .452 .634 .870 .552
2-2 .340 .859 .819 .515
3-2 .263 .843 .862 .523

I haven't checked to see if there are different gate values for various counts arrived at through different sequences (like 1-1 through BC,BS,FB, etc.)


11:44 PM May 4th
 
Steven Goldleaf
Wasn’t Pinson’s age in some dispute, long after his playing career had ended? I think he picked up a few additional years at some point, and (going on memory now) was in his early twenties, not his late teens, when he broke into the majors.

He was my first favorite MLB player, at the age of 8 in the summer of 1961 (me, not Vada) and I just loved his name. Just saying the words “Vada Pinson” makes me smile today.

Fascinating how their careers track each other so neatly.

5:42 PM May 4th
 
rcberlo
Wonderful stuff! Back in the 1950's the makers of the spinner-based Ethan Allen's baseball game tried to market a variant that put balls and strikes on the disks along with the various offensive outcomes (except for walks and strikeouts). I think all they were trying to do was to have the disk recreate each batter's walk and strikeout frequency. I hated how long it took to play a game, and the "improvement" was dropped.
3:32 PM May 4th
 
Ron
I really enjoyed this, as it takes me back to the time I fell in love with the game..I was 10yrs old in 67. My gut feeling is that I remember Cepeda being an hitter to be feared.. I was afraid when he came to bat against the Cubs ( my team )... I remeber Pinson as a player who could hurt you.... I remember Cepeda as a player who would hurt you.
1:44 PM May 4th
 
Ron
I really enjoyed this, as it takes me back to the time I fell in love with the game..I was 10yrs old in 67. My gut feeling is that I remember Cepeda being an hitter to be feared.. I was afraid when he came to bat against the Cubs ( my team )... I remeber Pinson as a player who could hurt you.... I remember Cepeda as a player who would hurt you.
1:43 PM May 4th
 
pob14
The Diamond Mind computer baseball game has had pitch-by-pitch simulation for several years now (it's an option; you can still play batter-by-batter), and I believe Action also added this recently. When you add an "era" (not an ERA, but a season or group of seasons you want to play in or create players for), the game will estimate ball/strike percentages based on the stats. I may have to check on how close their percentages are to your spreadsheet.
9:35 AM May 4th
 
Trailbzr
Nice feature (I mean a weekly blog). Hope you keep it up.

Retrosheet has comprehensive pitch sequencing data going back several years. It would seem a place to start could be identifying some recent pitchers whose K and BB (and maybe Hit) rates mirror Ford's, and create a chart of B,C,S,F,X by count. Then the gate rates could be calculated in reverse order.
7:27 AM May 4th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy