I appreciate the response about one shouldn't adjust for league quality using WinShares.
I know you mentioned in the "Win Shares" book that one might be able to use the value to evaluate trades.
Somehow adjusting for league quality would also be extremely useful in Hall of Fame selection debates.
A NL slugging first baseman from the 1950 (oh, say Gil Hodges) might look better compared to AL counterparts if his values (whether WS, WAR, etc) correctly accounted for league quality.
As for WAR, that value DOES adjust for league quality. However, the exact way BBref is doing it (and whether it's the correct way) is a bit of a mystery.
Some of us have been discussing this on the reader boards.
In the 2nd Historical Abstract, you listed some 16 ways to possibly use as a method for addressing league quality. Batting by pitchers compared to their league was among them.
If you had the time, how might you proceed? (I've read the conversation you had with Tom in 2018 on cross-era comps.
Asked by: DefenseHawk
The first thing I should tell you is that tackling this problem is probably beyond your skill set. I don’t know you and I don’t know what your skill set is, of course, but what I am saying that this is a REALLY complicated problem. If your skill set is at the level of John Dewan, Tom Tango, Ben Jedlovec, etc. then maybe you can deal with this, but if it isn’t, then I doubt that you’ll be able to produce anything. I did have an employee once, Matthew Namee, who was capable of doing this, so you never know; some people just have skills.
The second thing I should tell you is that anything you get out of your research is probably going to provide very limited insight into the Hall of Fame qualifications of anyone. The difference between the quality of the American League vs. the National League in the same years is just not large enough to be meaningful in evaluating the performance of individual players, with a few exceptions. There have been exceptional eras when one league got to be significantly stronger than the other for a few years, but as generalization, there’s just nothing meaningful there. Gil Hodges hit .273 with 370 homers. If you move that to the other league, same park, it might be 374 homers and .276 or something, but.. . .that’s it. That’s all you are going to find. If you find numbers larger than that, it is probably because you’re doing something wrong.
It’s an entirely different scale. Differences between leagues don’t operate on the same scale as differences between players. I spent 17 years trying to explain this to Red Sox scouts and completely failed, but. . Which is larger, the difference between two players or the difference between two teams? Obviously, the difference in skill level between two players in the same league is MUCH larger than the difference between two teams.
OK, which is larger: the difference between two teams, or the difference between two leagues? For the exact same reasons and to essentially the same degree, the difference between two teams is MUCH larger than the difference between two leagues at the same competitive level. You can walk it back one more level: which is larger, the difference between players’ individual SKILLS, or the difference between their overall skill levels? For the same reason, the differences in individual skills are MUCH larger than the differences in overall skill levels. The more you aggregate, the smaller the relative differences become.
I spent 18 years trying without success to explain this to Red Sox scouts. There are major league players who never get drafted because they play in the wrong league in college. You’d see some guy who hit .410 with 22 homers and 70 stolen bases in 65 games—and don’t misunderstand me, the Red Sox never drafted anybody based on their stats—but you’d ask for reports on this guy, and somebody would say, "Yeah, but we don’t know anything about the quality of the pitching in that league." Well, obviously, but here’s the thing. People think about differences between leagues as being on the scale of differences between players—but they’re not. They’re not anywhere NEAR that level. They’re 6 to 10% of that level. The aggregate level scale has been condensed 10 to 15 times. So you see a guy who has Babe Ruth level stats in college, it’s not all that relevant whether he’s facing Roger Clemens level pitching or Mike Gardiner level pitching, because the difference between the between a Roger Clemens level LEAGUE and a Mike Gardiner level league is only 6 to 10% of the difference between a Roger Clemens level player and a Mike Gardiner level player. It’s not really that big a deal.
Getting now to the issue of why the related questions here are difficult to analyze. There are many, many measurements which are indicators of the quality of play within a league, I would say about 40 different things; let’s say there are 35 small and weak indicators. In any given year 15-25 of them are going to point in the right direction, and 10 to 20 are going to point in the wrong direction.
You CAN draw reliable conclusions based on data like that; you can. But it’s really, really hard. There isn’t a large or dominant measure of league quality, American League to National, until inter-league play begins in 1997. After 1997, it’s fairly easy, up until then, it’s like herding statistical kitty cats. And before you can start herding these statistical kitty cats, you have to create the data. To completely create all of the data that I would like to have to study this issue would take me, I would guess, two to three years.
And you know in advance that you’re not really headed toward any big payoff. I mean, probably you narrow the year in which the National League moved ahead of the American League from ‘sometime between 1948 and 1957" to "1953 or 1954", but that’s about it. You’re never going to get paid for the three years it will take you to do that research—and, since very few people will ever read the research, very few people will ever believe that you actually do know what you actually do know.
Nonetheless, let me try to outline as best I can from memory what those 35 or 40 indicators of relative league quality might be, in case somebody wants, against my advice, to take on the research. It is a really interesting question; not a very important question, but an extremely interesting one.
Interleague Play is the largest and dominant element of the research, since 1997.
Prior to 1997, the strongest indicator that we’ve got of the quality of play is players moving from league to league.
In the years 1901 to 1903, the American League did not attract SOME of the National league’s stars; the American League lured away MOST of the National League’s stars, certainly over 50%. In the seven years after that, the American League teams did a far better job of finding and developing their own talent, coming up with Walter Johnson, Ty Cobb, Tris Speaker, Eddie Collins, Home Run Baker, Shoeless Joe Jackson, Ed Walsh, Smoky Joe Wood and others. Combining them with the stars who came over from the National League 1901-1903 (Nap Lajoie, Cy Young, Ed Delahanty, Sam Crawford and many others), there were more superstars in the American League in that era than in any other league, ever. The National League in the same era had three very good teams and five non-competitive, fumbling around wasting time teams.
What I am trying to get to is, the number of long-term stars in a league is an important indicator of the quality of play in the league. Well, they’re all weak indicators; this one is just a little stronger than some others. If you divide the players in the league into major and minor stars versus background level players (average and replacement level players) the background level talent is fairly stable. The quality of the stars is much more variable, thus a better indicator. Hall of Fame selections are useful info here, but of course you have to steer around the Frankie Frisch/selection bias problem.
Back to the issue of inter-league movements. When the Federal League operated for two years (1914-1915), it provided a conduit for a few players to move between leagues. But the 1903 peace treaty between the American and National Leagues did not create a process to make trades between the leagues, and so until the late 1950s, there was no such process. The only players who moved between leagues were those who were released and signed in the other league, often with a visit to the minor leagues between. These player movements, although the information is of some limited value, do not provide a reliable index of the relative strength of the leagues.
(Occasionally things would happen. In August, 1949, Johnny Mize was suddenly "waived" by all the National League teams, and signed by the New York Yankees. But such events were not common.)
Post-season, 1959, an interleague trading period was established for the period of the winter meetings, and then there was a second inter-league period and the time frame was expanded, etc., until interleague trades became common. As that happened (1960-1975) there was much more interleague movement, thus strengthening the value of the indicator. When free agency started (1976) movement between leagues became yet more common. In that era, movements between leagues are the best thing we have to compare the relative strength of the leagues.
Before then, we have:
The World Series, and
The All-Star Games
But that’s just a few games a year so it doesn’t mean a lot. Weighted over a period of years, it can be taken to be an indicator.
There are all kinds of things which are "internal indicators" of the strength of a league. The record in interleague play and the movement of players between leagues can be looked at as external indicators of league quality. There is a much wider field of internal indicators. An internal indicator is one which works without a direct comparison to any other league. For example, the league age spectrum.
A high quality league will have very, very few players in the league who are 18, 19, 20 years old, or who are 36 or older. The more players in the league who are at ages well off prime, the weaker the league. You can make this into an indicator of the quality of play in this way:
Subtract 27 from each player’s age,
If the player is OLDER than 27, divide the result by 2,
Square that number,
Subtract the result from 100, and
Divide by 100.
If a player is 17, this will give a result of 0% (.00); if he is 18, you get 19%;, at 19, you get 36%, at 20, you get 51%; at 21, you get 64%; at 22, 75%, at 23, 84%; at 24, 91%; at 25, 96%; at 26, 99%, and at 27, 100%.
Going down the slope post-27, at 28 you get 99.75%; at 29, 99%; at 30, 97.75%; at 31, 96%; at 32, 93.755; at 33, 91%; at 34, 87.75%, etc. You can figure it out from there. At age 42, you’re back down to 43.75%.
Then, for every non-pitcher in the league, you multiply his plate appearances by that percentage. Call that his "age weighted plate appearances." In the same way, for a pitcher, you can create age weighted innings.
Total up the age weighted plate appearances for the league, and divide by the total plate appearances for the league. The result is the. . . .what do we call it? We’ll call it LASE—the league age spectrum evaluation.
THIS IS A UNIVERSAL, ACROSS-THE-BOARD indicator of the quality of play in a league. It will always work, not with regard to every league but with regard to every class of leagues. It will work in comparing a AAA league to a Single-A league, a Single-A league to a Rookie League. Go back to 1954, and it will work in comparing an A league to a B league, a B league to a C league. It will work in comparing the SEC to an NAIA league,, or in comparing a league of low-level Universities to a league of Junior College teams. It will work in in Japan. It will work in comparing slow-pitch softball leagues.
In baseball, the two most obvious backward steps in the quality of play are World War II baseball and the first expansions, which expanded the major leagues by 50% in a period of nine years (1961-1969). Both of those backward steps WERE accompanied by drops in the league’s age spectrum evaluation, but not immediate or dramatic ones. I should add this: that those backward steps in reality were not nearly as large as contemporary reporters suggested that they were. World War II is often described as baseball played by teenagers and old men, but the LASE did not actually drop meaningfully until 1945. In 1945 it DID drop to very low numbers in both leagues, but in 1942-1944 there is hardly and impact on the major league age spectrum. At the time of the first expansions there was no immediate change in the age spectrum, but then a couple of years later, when the Ed Kranepools and Rusty Staubs started to pile up, there was a dip.
But this is what I am trying to get to: that there are things you can look at which are universal and really kind of obvious, but which do provide useful information about the quality of play within a league. If you combine ENOUGH indicators of that nature, then you start to get a pretty clear picture of the quality of play within a league.
If a league is segregated, all black or all white, then that obviously is an indication of weakness for that league.
The number and variation of international players is an indication of strength in the league. If the league pulls in players from Thailand and Australia and Venezuela, they’re working at it.
The attendance in the league is an indicator of the strength of the league. If one league draws three million fans per team and the other league two million, the one that draws three million fans per team is probably the stronger league. Again, that’s universal. I don’t know the facts, but I’d bet you anything that the SEC draws more fans than the Mountain West League. In 1954, the International League drew larger attendance than the Three-I League.
The record-keeping within the league is an indicator of the strength of the league. If they don’t keep track of RBI, walks and strikeouts by hitters, it is probably not a strong league. If they’re recording Exit Velocities, they’re probably a strong league.
The number of coaches employed in the league is an indicator of league quality.
The experience level of those coaches/managers is an indicator. If a head coach has been employed in baseball for 25 years, he probably has some skills. If he has been working in baseball for three years, we have less confidence of that.
Competitive balance within a league in an indicator of quality. If one team plays .700 baseball and another plays .250 baseball, that’s probably not a quality league.
In sabermetrics, we normally treat statistics as RELATIVE records which have no absolute meaning. This is what defined sabermetrics in the early days: that we would say that hitting .300 in Dodger Stadium in the 1960s might be more impressive than hitting .370 in Sportsman’s Park in 1922. People thought we were nuts, but we eventually won the argument; the whole world eventually came around to see it our way.
But once in a while, maybe 5% of the time, there IS some absolute meaning in a stat. There are a few stats which are systematically higher in a higher quality league than in a lower quality league. For example, fielding percentage. If you go through a 1954 Baseball Guide (pick your year), you will find that fielding percentages were higher in the major leagues than in the A leagues, higher in the A leagues than in the B leagues, higher in the B leagues than in the C leagues, higher in the C leagues than in the D Leagues.
For some of you, your first response will be that fielding percentages were higher in the better leagues because of better field conditions. Well, yes, of course. But that doesn’t invalidate the generalization. If you have a high-level league and a low-level league and one has a well-maintained field and the other does not, which do you think is the stronger league?
Wild Pitches and Passed Balls are more common in lower level leagues than in higher level leagues, particularly (as I recall) passed balls.
I also seem to recall that Hit Batsmen are more common in lower-level leagues, but I would want to check that out before I did any research based on that belief.
Lop-sided games (16-1, 22-3, etc.) are more common in lower leagues.
Triples and Inside the Park Home Runs are more common in lower leagues.
Double Plays are slightly more common in better leagues.
Why?
Triples, Inside the Park Home Runs and Double Plays are long-sequence events. Long-sequence events are indicators. They stand out in the data. Several things have to be "right" to get an inside the park home run.
Generally, high standard deviations are indicative of lower quality play. If the league batting average is .270 but with a standard deviation of 22 points, that’s probably a stronger league. If the league batting average is .270 but with a standard deviation of 45 points, that’s a weaker league.
Also from memory, I believe that you can show that the percentage of runs scored by home runs is higher in high-quality leagues than in lower-quality leagues. I’m not certain of that fact and I don’t like it, anyway, so I’ll let that pass for now.
Runs Not Drive In (RNBI); that is, Runs Scored in the League minus RBI in the league, divided by Runs Scored. . . that turns out to be a surprisingly strong indicator of league quality. RNBI are runs produced by Errors, Wild Pitches, Balks, Passed Balls, and a few other events, so it somewhat replicates things we are measuring in other ways, but I believe than RNBI are actually the strongest single indicator of the quality of play within a league that I am aware of. However, for whatever reason, this indicator suggests that the American League was somewhat stronger than the National League in the years 1962-65, which I am fairly certain is not true.
Any of these statements—and probably I have got some of them wrong—but any of these statements can be confirmed by studying the frequency of events in different levels of minor league ball, different levels of college ball, or different levels of high school baseball. They’re consistent variables.
If you want to compare the quality of leagues in the years Gil Hodges played, there is a vast amount of material that you can study—but there is no straight line toward the answer. All you can do is piece together little indicators. It’s a fascinating subject. Good luck to you.
Later—I got interested enough in this subject to do a few studies about it. I’ll publish these next week.