Remember me

The Revenge of Matty Alou

July 30, 2020
                            The Revenge of Matty Alou
 

            I have been working ahead of you all in the Runs Saved series, meaning that the stuff I am publishing is several days behind the stuff I am working on at the moment.   I am working ahead of you, but publishing stuff faster than I can create it, so you are catching up with me—a situation familiar to any teacher who has ever been required to teach a subject that he or she doesn’t know well.   At one point I was several weeks ahead of you; now, I’m maybe three days ahead. 

            Anyway, there was a question asked to me in "Hey, Bill" on Wednesday morning which I didn’t know the answer to but know how to get the answer to, so I thought I would take a day off from the Runs Saved series to address the issue.  The question had to do with Matty Alou hitting .330 every year in a league in which the league batting averages were at a very low level.   The essence of the question was, "Matty Alou hit .332 in a league in which Bob Gibson had a 1.12 ERA, so what would he hit in a normal league?"

            The answer, by the way, is .356.   But give us a minute. 

            Dallas Adams invented a process to answer that question in 1977 or 1978; actually Dallas took a formula that I had invented—the Log5 method—and figured out how to generalize this formula so that it applied to a wider range of issues, thus giving it much more value.   Also, some guy on the SABR Statistical Analysis Boards says that this process is the same thing as the Bradley-Terry process which was first published some time in the 1950s; I can’t figure out whether he is right or wrong about that, but I will note it in any case.  

            THIS METHOD HAS TREMENDOUS UNUSED POTENTIAL IN ANALYZING SPORTS.   I cannot stress this enough.  A few examples:

            If you are creating a simulation, such as a Table Game or other simulation, you need to know "If a .328 hitter is facing a pitcher who gives up a .206 batting average in a league in which the average is .262, what is the resulting batting average?  The resulting On Base percentage?  What are the resulting frequencies of walks, doubles, triples, homers, or strikeouts?"

            This method is how you can calculate that, and get the answer right.   It’s really the ONLY method by which you can calculate that, and get the answer right. 

            What if they are playing in a different league. . the batter hits .328 in a league in which the batting average is .270; the pitcher limits batters to a .249 average in a league in which the average is .256.   What will be the outcomes of the confrontation?

            This method tells you the correct answers.

            Adjusting parks.   Let us suppose that the batting average for the season is .252 in Dodger Stadium and .282 in Colorado, and let us suppose that a player hits .321 in Colorado.   What will he hit in Dodger Stadium?

            This is the method that you use to answer that question. . .all questions like that.  But it goes well beyond that.   Let’s suppose that you are dealing with two candidates to make the NCAA basketball tournament, one of which is 25-7, but the other is 19-13 but against a much tougher schedule.   Which is actually the better won-lost record?

            This is the method you would use to study that question. 

            Suppose that a player hits .285 in Double-A, but in conditions in which the environmental batting average is .261.   Suppose that he moves to Triple-A, but in Triple-A he plays in conditions under which the environmental batting average is .284.   What will he hit?

            This is the method that you would use to study that question. 

            It is a method that SHOULD BE used regularly by sports analysts, but, because it is a little bit confusing, a little bit awkward, it isn’t.   That’s why I am writing this article, in response to the Hey, Bill question about Matty Alou.   It relates to an extremely significant issue. 

            OK, we start with the Log5 method.  A Log5 is a number which represents the relative strength of any other number, when compared to .500.   The Log5 of any number less than .500 is less than .500 and less than the number itself, and the Log5 of any number greater than .500 is greater than .500 and greater than the number itself.  It is not actually a complicated system.  

            The issue here is that to solve many of these problems that I have talked about, you have to figure the Log5 of a Log5.   That gets hairy.   But I’ll walk you through it.  Basically, we are only using two formulas here.  We use them over and over, but we’re just using two formulas.  

            If our starting point number is x, then the first formula, for the Log5 of x, is

                        &nb​sp;           Log5 of  X  = X / (2 * (1-X))

 

            If X is .600, then the Log5 of X is .750.   We call it 750, rather than .750, because it is easier and doesn’t make any difference.     And the other formula is just A/ (A + B).   Just two really simple formulas; that’s all that it is.   It’s just using these two simple formulas again and again that makes it seem complicated.   

This process takes 15 lines.   For the purpose of simple illustration, I’ll start with a more simple question:  If a player hits .280 in a league in which the batting average is .250, what would be the equivalent batting average if the league batting average was .260?  Lines 1 and 2 are just the player’s batting average, and the league batting average:

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

 

            Lines 3 and 4 are the Log5s of these two numbers:

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

 

            And Line 5 is a simple arithmetic comparison of Line 3 to Line 4:

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

 

            Because .280 is higher than .250, this results in a figure larger than .500, but it can never result in a number lower than zero or higher than 1.000.  On Line 6 we introduce the alternative batting average that we want to see—in this case, .260:

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

           

And on Line 7 we figure the Log5 of that:

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

Line 7

Log5 Alternate League B Average

.176

 

            Then we compare Line 4 in this process to Line 7 by the other formula, a simple comparison:

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

Line 7

Log5 Alternate League B Average

.176

Line 8

Comparison of Line 4 and Line 7

.487

 

            Line 8 is less than .500 because .250 is less than .260.   Because .250 is less than .260, the Log5 of .250 (which appears on Line 4) is less than the Log5 of .260 (which appears on Line 7); therefore, Line 8 is less than .500.  It’s basically saying that .260 is a stronger number than .250.   ".487" represents the relative strength of .250 to .260. 

            We now have a number which represents the relative strength of .280 to .250 (.538), and a number which represents the relative strength of .250 to .260 (.487).   What we need to do now is triangulate these two numbers so that they give us the relative strength of .280 to .260.   We do that on Lines 9, 10 and 11: 

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

Line 7

Log5 Alternate League B Average

.176

Line 8

Comparison of Line 4 and Line 7

.487

Line 9

Log5 of Line 5

.583

Line 10

Log5 of Line 8

.474

Line 11

Comparison of Line 9 to Line 10

.552

 

            What that .552 really means is "When a .280 batting average is compared to a .250 batting average and a .260 league is compared to a .250 league, the .280 batting average is the stronger than the .260 league average, but not by as much as it is stronger than the .250 league average." 

            If the league batting average is .250, the hitters succeed (in that respect) 25% of the time.  That means that the PITCHERS succeed 75% of the time.  We put that number of Line 12, to represent the league’s typical pitcher:

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

Line 7

Log5 Alternate League B Average

.176

Line 8

Comparison of Line 4 and Line 6

.487

Line 9

Log5 of Line 5

.583

Line 10

Log5 of Line 8

.474

Line 11

Comparison of Line 9 to Line 10

.552

Line 12

One minus League Batting Average

.750

 

 

            Then, in lines 13 and 14, we compare the derivative figure from Line 9--.583—to the pitchers’ success percentage, in Line 12 (.750):

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

Line 7

Log5 Alternate League B Average

.176

Line 8

Comparison of Line 4 and Line 7

.487

Line 9

Log5 of Line 5

.583

Line 10

Log5 of Line 8

.474

Line 11

Comparison of Line 9 to Line 10

.552

Line 12

One minus League Batting Average

.750

Line 13

Log5 of Line 11

.615

Line 14

Log5 of Line 12

1.500

 

            I’m not really sure why we do that; Dallas Adams figured all of this out, and I just stumble around in the dark until I can remember what he did.  I don’t always remember why it is done that way.   Anyway, we’re almost done.   Now we make a simple comparison of Line 13 to Line 14:

 

Line 1

Batting Average

.280

Line 2

League Batting Average

.250

Line 3

Log5 Batting Average

.194

Line 4

Log5 League Batting Average

.167

Line 5

Comparison of Line 3 to Line 4

.538

Line 6

Alternate League Batting Average

.260

Line 7

Log5 Alternate League B Average

.176

Line 8

Comparison of Line 4 and Line 7

.487

Line 9

Log5 of Line 5

.583

Line 10

Log5 of Line 8

.474

Line 11

Comparison of Line 8 to Line 9

.552

Line 12

One minus League Batting Average

.750

Line 13

Log5 of Line 11

.615

Line 14

Log5 of Line 12

1.500

Line 15

Comparison of Line 13 to Line 14

.291

 

            And that’s our answer:   If a player hits .280 in a .250 league, he would hit .291 in a .260 league of the same quality.  Which makes sense, when you think about it; .280 is 12% higher than .250, and .291 is 12% higher than .260.    What we’re really doing is just increasing the player’s batting average in proportion to the change in the league batting average, EXCEPT that if we did that without limitation then a .900 hitter would hit more than 1.000, which isn’t possible.   The process is just bending the lines very gradually so that we keep the averages in the realm of the possible. 

            OK, let’s do Matty Alou, 1968.  In 1968 the league batting average was .243, so these are the first five lines of the process:

 

Line 1

Batting Average

.332

Line 2

League Batting Average

.243

Line 3

Log5 Batting Average

.249

Line 4

Log5 League Batting Average

.161

Line 5

Comparison of Line 3 to Line 4

.608

 

.332 compared to .243 is much higher than .280 compared to .250:

 

Line 1

Batting Average

.332

.280

Line 2

League Batting Average

.243

.250

Line 3

Log5 Batting Average

.249

.194

Line 4

Log5 League Batting Average

.161

.167

Line 5

Comparison of Line 3 to Line 4

.608

.538

 

            Now we need to decide what is a "typical" league batting average.  Let us say it is .263.   I’m not sure what the major league batting average since 1900 is, but it’s within a point or two of .263, so we enter that:

 

Line 1

Batting Average

.332

Line 2

League Batting Average

.243

Line 3

Log5 Batting Average

.249

Line 4

Log5 League Batting Average

.161

Line 5

Comparison of Line 3 to Line 4

.608

Line 6

Alternate League Batting Average

.263

 

            And then we just let the process run:

 

Line 1

Batting Average

.332

Line 2

League Batting Average

.243

Line 3

Log5 Batting Average

.249

Line 4

Log5 League Batting Average

.161

Line 5

Comparison of Line 3 to Line 4

.608

Line 6

Alternate League Batting Average

.263

Line 7

Log5 Alternate League B Average

.178

Line 8

Comparison of Line 4 and Line 6

.474

Line 9

Log5 of Line 5

.774

Line 10

Log5 of Line 8

.450

Line 11

Comparison of Line 9 to Line 10

.633

Line 12

One minus League Batting Average

.757

Line 13

Log5 of Line 11

.861

Line 14

Log5 of Line 12

1.558

Line 15

Comparison of Line 13 to Line 14

.356

 

            Matty Alou’s .332 batting average in 1968 is equivalent to a .356 batting average in a historically normal season.  I started to figure the Normalized batting average for every batting champion in history:

 

Matty

Honus

Nap

Jesse

Ed

Ginger

Nap

Honus

Alou

Wagner

Lajoie

Burkett

Delahanty

Beaumont

Lajoie

Wagner

1968

1900 NL

1901 AL

1901 NL

1902 AL

1902 NL

1903 AL

1903 NL

.332

.381

.426

.376

.376

.357

.344

.355

.243

.279

.277

.267

.275

.259

.255

.269

.356

.362

.409

.371

.362

.362

.353

.348

 

            But that was too much work, so I backed off to figuring the adjusted batting average for everybody who beat the league batting average by 100 points or more:

 

Cy

George

Ty

Honus

Honus

Ty

Ty

Nap

Seymour

Stone

Cobb

Wagner

Wagner

Cobb

Cobb

Lajoie

1905 NL

1906 AL

1907 AL

1907 NL

1908 NL

1909 AL

1910 AL

1910 AL

.377

.358

.350

.350

.354

.377

.383

.384

.255

.249

.247

.243

.230

.244

.243

.243

.387

.375

.369

.374

.396

.401

.408

.409

 

            So we see here that, normalized to the league batting average, Nap Lajoie’s .384 average in 1910 is actually more impressive than his .426 average in 1901.   Not that there is any consensus about what Lajoie’s batting average was in either 1901 or 1910.  Ty Cobb’s batting average normalizes to .408 in 1910, 1911 and 1912:

 

Ty

Joe

Sam

Ty

Joe

Tris

Nap

Heine

Cobb

Jackson

Crawford

Cobb

Jackson

Speaker

Lajoie

Zimmerman

1911 AL

1911 AL

1911 AL

1912 AL

1912 AL

1912 AL

1912 AL

1913 NL

.420

.408

.378

.410

.395

.383

.368

.372

.273

.273

.273

.265

.265

.265

.265

.272

.408

.396

.366

.408

.393

.381

.366

.361

 

 

Ty

Joe

Tris

Ty

Benny

Ty

Tris

Ty

Cobb

Jackson

Speaker

Cobb

Kauff

Cobb

Speaker

Cobb

1913 AL

1913 AL

1913 AL

1914 AL

1914 FL

1915 AL

1916 AL

1916 AL

.390

.373

.363

.368

.370

.369

.386

.371

.256

.256

.256

.248

.263

.248

.248

.248

.399

.382

.371

.387

.370

.388

.405

.390

 

Ty

George

Tris

Ty

Ty

George

Tris

Rogers

Cobb

Sisler

Speaker

Cobb

Cobb

Sisler

Speaker

Hornsby

1917 AL

1917 AL

1917 AL

1918 AL

1919 AL

1920 AL

1920 AL

1920 NL

.383

.352

.332

.382

.384

.407

.332

.370

.248

.248

.248

.254

.268

.284

.284

.270

.402

.370

.350

.393

.378

.382

.309

.362

 

            After Ty Cobb, .402 in 1917, the next normalized .400 hitter is Rogers Hornsby in 1924:

 

Harry

Rogers

George

Ty

Rogers

Harry

Babe

Rogers

Heilmann

Hornsby

Sisler

Cobb

Hornsby

Heilmann

Ruth

Hornsby

1921 AL

1921 NL

1922 AL

1922 AL

1922 NL

1923 AL

1923 AL

1924 NL

.394

.397

.420

.401

.401

.403

.393

.424

.292

.389

.285

.285

.292

.282

.282

.283

.360

.270

.393

.375

.367

.380

.370

.400

 

            I’ll include Bill Terry in 1930 because he hit .400, even though he actually was not 100 points better-than-league:

 

Harry

Rogers

Harry

Al

Rogers

Lefty

Bill

Al

Heilmann

Hornsby

Heilmann

Simmons

Hornsby

O'Doul

Terry

Simmons

1925 AL

1925 NL

1927 AL

1927 AL

1928 NL

1929 NL

1930 NL

1931 AL

.393

.403

.398

.392

.387

.398

.401

.390

.292

.292

.285

.285

.281

.294

.303

.278

.359

.369

.372

.366

.366

.362

.355

.372

 

            After Rogers Hornsby in 1924 the next normalized .400 hitter—and the last normalized .400 hitter—is Ted Williams in 1941:

 

Chuck

Arky

Joe

Joe

Ted

Stan

Ted

Stan

Klein

Vaughan

Medwick

DiMaggio

Williams

Musial

Williams

Musial

1933 NL

1935 NL

1937 NL

1939 AL

1941 AL

1946 NL

1948 AL

1948 NL

.368

.385

.374

.381

.406

.365

.369

.376

.266

.277

.272

.279

.266

.256

.266

.261

.364

.368

.363

.362

.402

.373

.365

.378

 

            Although normalized to a consistent league batting average, Ted Williams in 1957 is only 4 points behind Ted Williams in 1941:

 

Ted

Mickey

Harvey

Norm

Roberto

Rico

Joe

Rod

Williams

Mantle

Kuenn

Cash

Clemente

Carty

Torre

Carew

1957 AL

1957 AL

1959 AL

1961 AL

1967 NL

1970 NL

1971 NL

1974 AL

.388

.365

.353

.365

.357

.366

.363

.364

.255

.255

.253

.256

.249

.258

.252

.258

.398

.375

.365

.373

.374

.372

.376

.370

 

 

Rod

Rod

George

Wade

Willie

Tony

Wade

Andres

Carew

Carew

Brett

Boggs

McGee

Gwynn

Boggs

Galarraga

1975 AL

1977 AL

1980 AL

1985 AL

1985 NL

1987 NL

1988 AL

1993 NL

.359

.388

.390

.368

.353

.370

.366

.370

.258

.266

.269

.261

.252

.261

.259

.264

.365

.384

.383

.370

.366

.372

.371

.369

 

 

Tony

Jeff

Tony

Tony

Larry

Larry

Todd

Barry

Gwynn

Bagwell

Gwynn

Gwynn

Walker

Walker

Helton

Bonds

1994 NL

1994 NL

1995 NL

1997 NL

1998 NL

1999 NL

2000 NL

2002 NL

.394

.368

.368

.372

.364

.379

.372

.370

.267

.267

.263

.263

.262

.268

.266

.259

.389

.363

.368

.372

.365

.373

.368

.375

 

            Nobody has done this since Chipper Jones in 2008, but Mookie just missed it by three points, so I’ll include him:

 

Ichiro

Chipper

Mookie

Suzuki

Jones

Betts

2004 AL

2008 NL

2018

.372

.364

.346

.270

.260

.249

.364

.368

.363

 

            The .356 adjusted batting average in 1968 was the highest of Matty Alou’s career.  Matty never played in a league with a batting average as high as .263; .262, yes, but not .263.   His batting average would adjust upward for every season and partial season of his career.  His career batting average would adjust to .319:

 

Lg

YEAR

AB

H

Avg

Lg Average

Adjusted Average

Adjusted Hits

NL

1960

3

1

.333

.255

.342

1.0

NL

1961

200

62

.310

.262

.311

62.2

NL

1962

195

57

.292

.261

.294

57.3

NL

1963

76

11

.145

.245

.157

11.9

NL

1964

250

66

.264

.254

.273

68.3

NL

1965

324

75

.231

.249

.244

79.1

NL

1966

535

183

.342

.256

.350

187.3

NL

1967

550

186

.338

.249

.354

194.7

NL

1968

558

185

.332

.243

.356

198.6

NL

1969

698

231

.331

.250

.346

241.5

NL

1970

677

201

.297

.258

.302

204.5

NL

1971

609

192

.315

.252

.328

199.8

NL

1972

404

127

.314

.248

.331

133.7

AL

1972

121

34

.281

.239

.308

37.3

AL

1973

497

147

.296

.259

.300

149.1

NL

1973

11

3

.273

.254

.282

3.1

NL

1974

81

16

.198

.258

.202

16.4

 

 

5789

1777

.307

 

.319

1845.7

 

            Thanks for reading.

 

            Bill 
 
 

COMMENTS (17 Comments, most recent shown first)

rezk42
I take it back, you can do this with multiple stat categories just fine. The answer you get will definitely depend on how you choose to divy up plate appearances into outcomes. I'll illustrate by moving Carl Yastrzemski from 1968 AL to 2019 AL.

Here are CYs stats for 1968:
1B 105, 2B 32, 3B 2, HR 23, Out 377, BB 119, PA 658

(I'm using Out=AB-H, and PA doesn't include HBP+SH+SF to keep things simple.)

Here are the totals for AL 1968:
1B 9043, 2B 1874, 3B 338, HR 1104, Out 41350, BB 4881, PA 58563

Here are the totals for AL 2019:
1B 12973, 2B 4318, 3B 385, HR 3478, Out 62403, BB 7916, PA 91473

For each category (except PA), do the calculation:

(CY 1968) * [AL 2019 / AL 1968].

For instance, for singles we have

105 * [12973 / 9043] = 150.63.

Doing this for every category, you get totals:

1B 151, 2B 74, 3B 2, HR 72, Out 569, BB 193

This adds up to 1061 PA, many more than the original 658 because the league is bigger. So adjust by multiplying by 658/1061 to get

1B 94, 2B 46, 3B 1, HR 45, Out 353, BB 120,

a line of .345 .464 .684.

What I get very much depends on how I break things up into categories. If I hadn't broken up hits into 1B+2B+3B+HR, then CY 2019 batting average would only be .327. Because I count 2B and HR as separate categories than singles, he gets a boost since he was well above average in these categories in 1968, and both happened far more frequently in 2019 than 1968.

The nice thing is that this method cannot produce utter nonsense (like more hits than at bats). It's still bad with extreme outliers: moving 1920s Babe Ruth to present day is insane. I guess you might get better results there by doing the sort of thing described in chuck's reply, where you have a multi-step process: you first do an adjustment for broad categories, then subdivide the broad categories and do adjustments for each subdivision.

3:30 PM Aug 22nd
 
rezk42
So here's an explanation of what's really going on with the "Log5" method. (Bill, I remember you described this method in one of your abstracts ... when I read it I understood it is the same as the method I describe below.)

First an observation: the "2" in the formula for Log5 really doesn't do anything! That is, if you had instead defined Log5( x ) to be "x/(1-x)" instead of "x/2*(1-x)", then your method gives the exact same answer. (Basically, the 2s cancel out in all the formulas.)

So I'd rather use this modified function, which I'll call the "ratio" function:

R(x) = x/(1-x).

We can use this to convert all "averages" into "ratios":

For instance, a .250 Batting Average translates into a Batting Ratio of R(.250) = .250/.750 = .333 = 1/3. This is basically saying: for every 1 hit you get, you are out 3 times.

Likewise, a .333 Batting Average translates to a Batting Ratio of R(.333) = .333/.667 = .500 = 1/2. For every 1 hit you are out 2 times.

Basically, if Henry Chadwick had made a different choice, we might be talking about Batting Ratios today, instead of Batting Averages. Notice that a ratio can be anywhere between 0 and infinity.

You can convert backwards using the formula A(y) = y/(1+y). So a .333 Batting Ratio translates to a Batting Average of A(.333) = .333/1.333 = .250.

So now I can describe your method, which is pretty easy to say in terms of ratios:

Adjusted Batting Ratio = Batting Ratio * [Adjusted League BR / League BR]

For example:

Batting Average = .280 translates to Batting Ratio = .389.

League BA = .250 translates to League BR = .333.

Adjusted League BA = .260 translates to Adjusted League BR = .351.

Now the formula gives Adjusted BR = .389 * [.351/.333] = .410.

Then you can convert this back into averages, to get an Adjusted BA = A(.410) = .291.

As you noticed about Babe Ruth's home runs, this can give nonsense answers in extreme cases when you use it for multiple stat categories. For instance, if you use it for SO and HR (compared to a baseline of AB), then it is very possible that the method says Babe Ruth today would have more SO+HR than AB, since he had more of both of these relative to his league, and both SO and HR are much more common today. (I think it's very hard to find a simple method that avoids this problem.)

7:28 PM Aug 21st
 
murrayj
I've always used the following method (applied to 1927 Ruth as an example). It's much simpler than this, but it produces more believable numbers for Ruth.

Basically it assumes than any outcome is 50% the batter and 50% the league (pitchers, ballparks, equipment, etc.)

So, in 1927 Ruth's outcome for HR% was 8.7% and the league's contribution to that was the league average of 0.9%

To figure Ruth's contribution to the outcome we do (outcome*2)-league avg, in this case: (8.7*2)-0.9=16.5%

Then to figure Ruth's performance in the new league, in this case 2019, we add the league average to Ruth's calculated contribution and divide by 2: (16.5+3.7)/2= 10.1%

If we apply this HR% to Ruth's 691 PA we get 70 expected HR in a 2019 environment.

To get batting average we do this same thing with BB%, K%, and BABIP

Ruth 1927 for those was 19.8, 12.9, and .338
1927 lgAVG for those was 8.4, 7.1, and .303

Ruth's contribution to his outcomes is a 31.2 BB%, 18.7 K%, and .373 BABIP

Then we average these with 2019's AL lgAVG of 8.5 BB%, 23 K%, and .298 BABIP to get Ruth's expected 2019 numbers:

BB% 19.85
K% 20.85%
BABIP .3355

We can use this with PA to figure AB and BIP and then figure AVG and OBP

691 PA and a 19.85 BB% yields 137 BB. Subtract that from PA and get 554 AB, BUT Ruth had 14 SH and in 2019 a hitter like Ruth would never sac bunt. So add 14 more ABs and he is at 568. The 20.85 K% leads to 144 expected Ks in 691 PA. If we take the 70 HR and 144 Ks away from the 568 ABs we get 354 BIP.

We expect Ruth to hit .3355 on BIP so that is 119 non-HR hits. Add the 70 HR and we have 189 H in 568 AB for a .333 AVG

Final expected 2019 line:

.333 AVG, .472 OBP, 137 BB, 144 Ks, 70 HR

Why is this not an accurate method for answering questions like this?
12:03 AM Aug 13th
 
chuck
Regarding the Ruth thing - where one is trying to use the method to see his stats through the lens of a different era - I think one might use the method in a 2-pronged approach. One might first need to use it for strikeout rate, as that is a thing that's gone through the roof and will affect how many batted balls Ruth would have. If Ruth struck out around 13% of the time in a league where batters struck out 8% of the time, one might find with this method that his strikeout rate would be pegged in the high 30% range in a 2019 context.

Given 691 PA, and let's say 270 strikeouts, the next thing might be to estimate how many walks a hitter striking out that much might get. Hitters in 2019 that struck out over 30% of the time got walked about 9% of the time. Let's say, then, that Ruth strikes out 37% (256) and is walked 9% (62). And gets hit 1% of the time (7). That leaves just 366 batted balls.

From that, one might again go back to the method and plug in hits/batted ball rates for Ruth (.426) and 1927 (.311), and 2019's rate (.337), and find that Ruth's rate translates to something like .460. Times 366 batted balls = 168 hits, or a .262 batting average when you add the strikeouts back in.

From that, I would say one might break the 165 down into the same 1b-2b-3b-HR ratios he had in '27, giving him 51 homers. With 262 K's and a .270 average, it's kind of a Khris Davis profile- with more K's but a bit better average.
12:27 AM Aug 1st
 
jfenimore
Reminds me of the old story Ralph Kiner repeated over and over about "What would Ty Cobb hit if he were playing today?
2:49 PM Jul 31st
 
shthar
You might have just revealed Hal Richman's secret formula.


2:43 PM Jul 31st
 
bjames
For this to truly work, it has to be reasonable WITHOUT boundaries, rather than reasonable WITHIN boundaries. Reasonable within boundaries is easy; a hundred methods can do that. Reasonable without boundaries is what makes this method different.

I'm wondering if there is a glitch in the way I have explained this. Using willibphx approach, one runs into an illogical sequence very quickly. Let's say that Ruth in 1927 translates to 190 Home Runs in 2019 (I actually ran the numbers myself, and I got 200 HR, but let's not hang up on that.) But if you figure his HITS in 2019, they would obviously go DOWN from 1927. Ruth had only 192 hits in 1927, so, adjusting his hits and adjusting his home runs, he would wind up with more home runs than hits. Since that is impossible, I would say that if that is the actual conclusion, then the method does not work.

I think I must have mis-explained it, and I think I'll remove this article from the site until I have a chance to test what I wrote more thoroughly. It wouldn't be the first time I have mis-explained this method.


I know that when I do studies like that, what I do is this. I assign the hitter a "P9" level, which is (PA - HR) / PA. For Babe Ruth, 1927, this would be .913; if a random number comes up and is greater than .913, then Ruth has a Home Run. The P8 level is (PA - HR - 3B) / PA; if a random number comes up that is greater than that number but less than .913, then Ruth has a Triple. The P7 number is the "floor" for doubles, P6 is the floor for hits, P5 is the floor for walks, P4 the floor for Hit Batsmen, P3 the floor for Sacrifice Bunts, P2 the floor for Fly Ball Outs, P1 the floor for Ground Ball Outs, and less than P1, the result is a strikeout.

Within a league I assign both the pitcher and the hitter P9, p8. ..P2, P1 scores, and the league also; then you compare the hitter to the league, the pitcher to the league, and generate a new set of P9, P8, P7 etc scores. Then you can transport them to a different league. I'm confident that if done correctly, that cannot result in more Home Runs than hits; it cannot result in a higher value for P8 than for P9, etc. So the fact that this does happen suggests that I screwed up the explanation somewhere and somehow. So I'll remove this article in a day or two. ...not HIDING my mistake; I merely don't want to perpetuate it. . . then I'll try to figure out where I went wrong, and try again.

Bill
10:51 AM Jul 31st
 
willibphx
First a correction, was using 2019 AL.

And because it is fun to play what if.

Nolan Ryan in 1972/73/74 had 329/383/367 strikeouts.

Adjusted to 2019 AL 467/585 and 579.

BBs not effected as much going from 157/162/ 202 to 160/154/206.

Pitch counts would be fantastic
9:39 AM Jul 31st
 
willibphx
Good to see this methodology explained in such detail. It seem reasonable within boundaries. When applied to lower probabilities it seems to give strange results. For example, I used it to project 1927 Ruth HRs to 2020 AL. Ruth in 1927 hit HRs on 9.2025% of his PA versus a league average of 0.9004% HR/PA. Came back with 190 HR based on a 2020 AL average of 3.5638%. In fairness to the methodology, Ruth's numbers tend to cause problems with a lot of mathematical methods.
9:18 AM Jul 31st
 
Jaytaft
Just caught a mistake in my last comment: I typed "3-year .345 streak" but it should be "4-year .345 streak."
11:24 PM Jul 30th
 
Jaytaft
This is breathtaking. I can't thank you enough.

Since 1960, Alou's 4-year .330 streak (5 times: Alou, Carew, Boggs, Gwynn, Pujols) has been as rare as a 35-game hitting streak: Rose (44), Molitor (39), Jimmy Rollins (38), Luis Castillo (35), Utley (35).

With the adjustment, Alou's streak becomes a 3-year .345 streak, which has been as rare (3 times, with Boggs and Gwynn) as winning a Triple Crown.

Revenge, indeed!
10:47 PM Jul 30th
 
chuck
Thanks, Bill, for this very informative article. I am sure many of us will flag it and use it in the future.

It’s interesting that it was Matty Alou and the year 1968 that initiated this. While I don’t doubt that the Log5 method is reliable on the whole, in this specific instance I think Alou may perhaps have been the type of hitter least affected by the conditions of that season, and that his 1968 average, in a more normal 1960s batting average context, might well have been close to the same.

The reason is that in 1968 (and the latter 1/3rd of 1967) there was likely a factor of a dead baseball sapping power across both leagues, but especially so in the NL. If one compares 1968 to a 3-year average for 1964-1966 in singles, doubles, triples and homers per AB, it looks like this (sorry if things don't line up):

1b ………. 2b ..… 3b....… HR ..... Period
18.4% . 3.8% . 0.8% . 2.4% .. 1964-66 average
18.4% . 3.6% . 0.7% . 1.6% .. 1968

Alou, being a singles hitter mostly, would seem to lose little in 1968, while home runs per AB were down 31%. Strikeouts, incidentally were up a bit in ’68 from 1964-66 levels, but only from 15.3% to 15.7%, per batter faced.

If 1967 is thrown into that table, one can see where things changed with home runs that year, with the splits divided before and after August.
1b ………. 2b ….. 3b …... HR ...… Period
18.4% . 3.8% . 0.8% . 2.4% .. 1964-66 average
17.9% . 4.0% . 0.8% . 2.3% .. 1967 April-July
---------------------------------------
18.7% . 3.7% . 0.7% . 1.6% .. 1967 August-Sept.
18.4% . 3.6% . 0.7% . 1.6% .. 1968

12:43 PM Jul 30th
 
John-Q
Yeah, I forgot about Tom Herr and his 100 RBI’s, that was odd. I remember that he hit 100 with only 8 HR which was seen as a real oddity at the time. I’m not sure but I think it was the first time that a guy with less than 10HR drove in 100 since the dead ball era.

I was a big Met fan at the time and I hated that Cardinals team. They traded for a 34 year old Cesar Cedeno and he hit .434 with a 1.200 ops during August and September. I was so happy when the Royals won that WS. That’s one of my favorite WS of all time. I became a Quasi Royal fan after that.

And then Tudor didn’t even make the All Star team that year. As great a season as Gooden had, Tudor actually led the league in WHIP with a .938 and shutouts (10). Tudor would have won the Cy Young in mostly any other year.

Gooden had an odd season for as dominant as it was. He didn’t lead the league in Whip & Shutouts Tudor did. He didn’t lead the league in K/9 or H/9, Sid Fernandez did. Hershiser led the league in HR/9 and W/L%. Lamar Hoyt led the league in BB/9. Eckersley led the league in K/BB ratio.

Gooden must have been very clutch that season because he had a 9.3 WPA which is the 4th best single season by a pitcher in MLB history.
11:22 AM Jul 30th
 
bjames
That 1985 Cardinal team was an odd team in retrospect because you had an MVP fluke season by McGee on one side and then you had a Cy Young caliber fluke season by Tudor on the pitching side. Tudor would have won the Cy Young in a normal season without ‘85 Doc Gooden.

Tommie Herr drove in 110 runs; that was pretty odd. Tudor threw 10 shutouts--a record which would never be broken if it was in fact a record--and did not win the Cy Young Award.
10:22 AM Jul 30th
 
John-Q
What an odd season by Willie McGee in 1985. I guess it was early in his career so people didn’t think about it in terms of a fluke season. His median win shares for 11 full seasons was 15 and he put up 36 win shares in 1985.

That 1985 Cardinal team was an odd team in retrospect because you had an MVP fluke season by McGee on one side and then you had a Cy Young caliber fluke season by Tudor on the pitching side. Tudor would have won the Cy Young in a normal season without ‘85 Doc Gooden.
9:42 AM Jul 30th
 
bjames
Cash, of course, is .361, not .365.

It has been 11 years since anyone has beaten the league average by 100 points--the longest such gap in history. Longest previous was 9 years, 1948-1957.
9:06 AM Jul 30th
 
John-Q
I remember Pete Palmer had something similar to this in the first edition of “Total Baseball”. It was far more basic. I think he called it “Relative Average”. He would just take the league average and then subtract it from a player’s batting average. He had an essay attached referencing Pete Rose’s 1968 season and why his batting average that season was more impressive than it appeared on the surface.

So Rose hit .335 in 1968 but the league average was .243 so that gave Pete a +.92 Relative Average.
8:57 AM Jul 30th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy