I. DNA
Here’s a baseball-statistics problem that I have thought about for 30 years and have no solution to.
How do you estimate the probability that two seasons represent seasons by the same player? Let us take these two seasons:
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
OBA
|
SPct
|
OPS
|
157
|
612
|
128
|
207
|
41
|
13
|
36
|
123
|
107
|
38
|
3
|
0
|
.338
|
.438
|
.624
|
1.062
|
157
|
593
|
127
|
200
|
53
|
9
|
30
|
113
|
105
|
32
|
3
|
4
|
.337
|
.437
|
.609
|
1.046
|
Those are the 1949 and 1953 seasons of Stan Musial. One can easily recognize that those are two seasons by the same player, because they are so much alike; they are within the same parameters in every category, thus easily recognizable as products of the same baseball DNA.
Or take these two seasons:
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
OBA
|
SPct
|
OPS
|
152
|
578
|
124
|
205
|
30
|
12
|
32
|
108
|
98
|
40
|
4
|
5
|
.355
|
.449
|
.614
|
1.063
|
128
|
454
|
68
|
124
|
24
|
7
|
5
|
51
|
61
|
45
|
7
|
0
|
.273
|
.364
|
.390
|
.754
|
That’s one season by Stan Musial and one season by Johnny Wyrostek, and one can easily recognize those two seasons as being the products of two very different players. But take these two seasons:
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
OBA
|
SPct
|
OPS
|
155
|
603
|
115
|
197
|
39
|
10
|
34
|
120
|
56
|
64
|
21
|
9
|
.327
|
.381
|
.594
|
.974
|
153
|
545
|
117
|
176
|
32
|
7
|
37
|
124
|
71
|
64
|
22
|
3
|
.323
|
.404
|
.611
|
1.015
|
Those two obviously could be two seasons of the same player, but as it happens, they’re not; that’s the 1961 season of Henry Aaron and the 1961 season of Frank Robinson. Or take these two seasons:
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
OBA
|
SPct
|
OPS
|
154
|
649
|
113
|
208
|
48
|
13
|
25
|
86
|
54
|
38
|
9
|
7
|
.320
|
.374
|
.550
|
.925
|
150
|
599
|
82
|
147
|
22
|
12
|
3
|
48
|
55
|
47
|
56
|
14
|
.245
|
.310
|
.337
|
.647
|
You probably wouldn’t suppose that those are two seasons by the same player, but as it happens, they are; those are the 1937 and 1943 seasons of Wally Moses. Or take these two seasons:
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
OBA
|
SPct
|
OPS
|
130
|
235
|
26
|
51
|
15
|
2
|
6
|
26
|
26
|
56
|
11
|
7
|
.217
|
.295
|
.374
|
.670
|
154
|
604
|
104
|
179
|
35
|
4
|
31
|
82
|
77
|
107
|
38
|
16
|
.296
|
.377
|
.522
|
.899
|
Those are the 1968 and 1970 seasons of Tommy Harper.
There are two elements to this problem, which are similarity and uniqueness. Any season by Ted Williams is pretty easy to identify as a Ted Williams season because there really isn’t anybody else who hits .350, .360 in a typical season with 35 homers, 140 walks and 32 strikeouts. Stan Musial’s two seasons (above) are identifiable because they are very similar, but also because Musial is fairly unique; not as unique as Williams, but still fairly unique.
Rickey Henderson is notable in this respect. Think about it this way: Can you take seasons from other players, and make a Rickey Henderson career out of them? You can’t. There aren’t any other players who are enough like Rickey Henderson that you can make somebody look like Rickey Henderson. Take Willie Mays, for example; you can make a "Willie Mays" career out of seasons by other players. Mays’ 1965 season is very similar to Frank Robinson’s 1966 season. You can take a season of Frank Robinson, a season of Duke Snider, a season out of Mantle, a couple of seasons from Henry Aaron, a season from Dick Allen, a season from Barry Bonds, a season from Vlad Guerrero, and you can make out of them what looks for all the world like a Willie Mays career. You can do that with Jimmie Foxx; take a season of Hack Wilson, a season of Greenberg, a season of Gehrig, etc.
Mays was a greater player than Henderson, but you can’t do that with Rickey Henderson; you can’t do it with Henderson or Ted Williams, probably not Joe Morgan, maybe not Mantle, maybe not Adam Dunn. There are only maybe a dozen players in baseball history that you can’t "match" with seasons by other players.
Similarity and uniqueness are related concepts but distinguishable concepts. All of these seasons are fairly similar, but they’re all by different players:
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
BB
|
SO
|
SB
|
CS
|
Avg
|
OBA
|
SPct
|
OPS
|
105
|
399
|
81
|
110
|
15
|
9
|
8
|
70
|
31
|
35
|
19
|
0
|
.276
|
.351
|
.419
|
.769
|
144
|
505
|
59
|
137
|
25
|
2
|
10
|
67
|
46
|
58
|
3
|
5
|
.271
|
.335
|
.388
|
.723
|
114
|
420
|
53
|
117
|
19
|
3
|
10
|
65
|
24
|
37
|
2
|
0
|
.279
|
.322
|
.410
|
.732
|
151
|
591
|
59
|
160
|
25
|
9
|
9
|
56
|
27
|
42
|
2
|
0
|
.271
|
.310
|
.389
|
.700
|
121
|
424
|
49
|
117
|
23
|
5
|
9
|
57
|
37
|
40
|
5
|
5
|
.276
|
.334
|
.417
|
.752
|
137
|
485
|
50
|
135
|
17
|
4
|
7
|
56
|
28
|
47
|
4
|
5
|
.278
|
.317
|
.373
|
.691
|
153
|
553
|
66
|
150
|
20
|
4
|
10
|
59
|
51
|
72
|
6
|
5
|
.271
|
.332
|
.376
|
.708
|
134
|
489
|
66
|
133
|
29
|
2
|
9
|
66
|
39
|
57
|
3
|
2
|
.272
|
.326
|
.395
|
.721
|
146
|
514
|
58
|
142
|
26
|
2
|
8
|
69
|
52
|
69
|
14
|
11
|
.276
|
.340
|
.381
|
.721
|
151
|
505
|
59
|
138
|
24
|
1
|
10
|
63
|
64
|
67
|
2
|
2
|
.273
|
.353
|
.384
|
.737
|
130
|
444
|
52
|
123
|
25
|
2
|
8
|
61
|
37
|
57
|
6
|
7
|
.277
|
.329
|
.396
|
.725
|
127
|
438
|
68
|
120
|
28
|
4
|
10
|
69
|
41
|
51
|
7
|
7
|
.274
|
.331
|
.425
|
.755
|
128
|
503
|
66
|
139
|
30
|
4
|
10
|
61
|
41
|
76
|
15
|
5
|
.276
|
.335
|
.412
|
.746
|
156
|
526
|
67
|
147
|
29
|
3
|
9
|
63
|
58
|
81
|
5
|
2
|
.279
|
.355
|
.397
|
.752
|
112
|
392
|
57
|
107
|
29
|
4
|
8
|
67
|
49
|
57
|
15
|
13
|
.273
|
.353
|
.429
|
.782
|
138
|
433
|
49
|
118
|
33
|
3
|
7
|
55
|
30
|
40
|
11
|
6
|
.273
|
.319
|
.411
|
.730
|
162
|
647
|
80
|
177
|
25
|
9
|
9
|
59
|
38
|
70
|
5
|
6
|
.274
|
.313
|
.382
|
.695
|
141
|
484
|
68
|
131
|
33
|
1
|
10
|
58
|
38
|
48
|
9
|
3
|
.271
|
.321
|
.405
|
.726
|
131
|
411
|
51
|
113
|
20
|
3
|
10
|
66
|
36
|
46
|
6
|
4
|
.275
|
.330
|
.411
|
.742
|
129
|
456
|
68
|
125
|
33
|
3
|
7
|
59
|
45
|
68
|
5
|
4
|
.274
|
.345
|
.406
|
.751
|
145
|
504
|
60
|
136
|
27
|
4
|
10
|
63
|
50
|
68
|
3
|
1
|
.270
|
.337
|
.399
|
.736
|
129
|
502
|
57
|
138
|
29
|
3
|
7
|
59
|
44
|
55
|
6
|
3
|
.275
|
.332
|
.386
|
.719
|
138
|
503
|
69
|
136
|
22
|
2
|
10
|
55
|
37
|
64
|
6
|
3
|
.270
|
.317
|
.382
|
.699
|
Chronologically, those are seasons by Jake Beckley (1896), Blimp Hayes (1936), Johnny McCarthy, Danny Litwhiler, Bob Kennedy, Danny Cater, Buddy Bell, Mike Ivie, Joel Youngblood, Jim Sundberg, Ken Griffey Sr., Shane Mack, Scott Cooper, Luis Gonzalez, Keith Lockhart, Neifi Perez, B. J. Surhoff, Morgan Ensberg, Tony Graffanino, David Bell, Mark Kotsay and Edgar Renteria (2008).
We can do that because those seasons are historically common, whereas the Rickey Henderson, Ted Williams-type seasons are historically scarce. You can measure similarity; you can measure uniqueness. By measuring both similarity and uniqueness and relating them in the right format, it would have to be possible to estimate the likelihood that any two seasons were the products of the same player. But I don’t know how to do that.
2. Witless
In the article "Wit" which I wrote a few weeks ago I was trying to talk about the exaggerated claims of degeneration by the Republican party, and this finally reminded me of somebody else.
I went to the SABR convention in 1977, I think it was in Chicago. SABR at that time was an organization of maybe 300 people, and there were about 75 people at the convention, almost all of whom were hobbyists, rather than the professionals and quasi-professionals who dominate the organization now. I went to lunch with this guy, long since dead, who was telling me about his "projections" for players for the next season. "The thing about my system," he said, "is that in my system, sometimes a player will project to hit negative home runs next season." John Mayberry, for example, had hit 34 homers in 1975 but only 13 homers in 1976. He thus projected that Mayberry would hit negative 8 home runs in 1977.
I remember hearing that and thinking, as you would, "Jesus Christ; I’ve gone to lunch with a complete idiot." He was an idiot, but he was ahead of his time in a certain way, which was that he realized that it would be interesting if you could project what players would do next season. He didn’t have any idea how to do it, but he had figured out that it would be interesting. That was something.
The moment was important to me, because it pushed to the forefront of my mind the need to understand where he had gone wrong; in other words, by the sheer crushing weight of his naked stupidity, he had pointed out a problem that needed attention. I remember thinking, within moments of his making that statement, that "I’ll bet the opposite is true. I’ll bet that, when a player’s home run total goes down sharply in one season, it usually goes back UP the next season."
Whether it "probably" goes up or down the next season depends on whether the player remains a regular. If you look at players who remain regulars, then yes, a large decline is normally followed by a recovery, not by another large decline. On the other hand, many players who have large declines in home runs never play regularly again, thus never recover (although very few of them actually succeed in hitting negative home runs.)
In general, though, as long as you are operating within parameters, resistance is more powerful than momentum. If two teams are in a fairly even contest but one of them has momentum, always bet on the team that DOESN’T have the momentum. If a team declines in one season from 90 wins to 80, their wins in the next season will tend to be 85, not 70. If a player hits .300 one season and .250 the next, he will tend to hit .275 in the third season, not .200. Resistance is much more powerful than momentum.
The same with the alleged degeneration of the Republican Party. Yes, it is possible that the Republican Party will implode, and their place in the political forum will be taken by the Libertarians; it is possible. This would be parallel to the player who drops from 34 homers to 13 homers then losing his status as a regular, thus hitting 5 home runs in the third season (or negative 8.)
But it is much, much more likely that the Republican Party will recover and win the off-year elections in 2014. The Yammering Heads on TV who talked endlessly after the election about the dreadful problems of the Republican Party are, in my view, on the same level as that nitwit who talked about John Mayberry hitting negative eight home runs in 1977. They have blinded themselves to the obvious parameters of the problem.
3. Opinions and Ideas
What is the difference between an opinion and an idea?
I generally despise opinions—yours, mine, and especially somebody else’s. I do not listen to talk shows, sports or otherwise, and I don’t like to allow myself to give my opinions in print, although I do of course. When people ask for my opinion about something, I will try to invent a way to approach the question quantitatively and objectively, rather than simply giving my opinion.
I am, on the other hand, entirely driven by ideas. It occurred to me then to ask: what exactly is the difference between an idea and an opinion?
Back up one. . .what is the difference between "philosophy" and "a philosophy". I don’t know anything about philosophy, but I am consumed with the effort to find better ways to think about problems. I recognize the contradiction. No doubt I should have educated myself about philosophy. I didn’t. I’m an ignoramus; just ignore me.
"Philosophy" is the search for truth, the effort to understand the true nature of the problem. Philosophers often reach for the true nature of the universe, which may be why we ignore them; we all know, intuitively, that no one can figure out the true nature of the universe. Let us say that philosophy is the effort to understand the nature of a problem.
"A philosophy", on the other hand, is an organized way of thinking about a problem. We all develop A political philosophy, A philosophy about raising children, A philosophy about education, A philosophy about sports. A philosophy is the back-end result of philosophy; A philosophy is the excrement of philosophy. When there is no food value left in food, your body gets rid of it. When you stop trying to figure out the true nature of the problem, then you have A philosophy about the problem. A philosophy is what you have when you’re done thinking.
I despise opinions because opinions are barriers to thought. An "opinion" is formed by the intersection of a live topic (Roger Clemens’ acquittal, the shooting in Connecticut, Hillary Clinton’s health problems) with A general philosophy. I don’t believe in Any general philosophy, as a rule, because I don’t think that anyone understands the world or that anyone’s way of thinking systematically about the world holds up to scrutiny. Any moron can see what is wrong with either liberalism or conservatism, if he merely has the intellectual integrity to admit it, just as any moron can easily see the flaws in Christianity, Judaism, or atheism. We can’t move on from there to A philosophy that does work, however, because we’re simply not smart enough to construct one. The world is billions of times more complicated than the human mind; therefore, none of us can develop A philosophy that consistently explains new and diverse phenomenon. There is no doubt a name for this philosophy.
An idea, on the other hand, has no first object (such as Clemens, Connecticut, guns or Hillary), and it has no philosophy. An idea is formed not from A philosophy and an issue, but from a question and some avenue of thought running into that question. How do I measure this? What is the value in this? What is the potential in this?
People sign on to liberalism and conservatism not because they are too stupid to see their flaws; no one is that stupid. People sign on to them because they cannot stand to live with unanswered questions. The source of all anxiety is unanswered questions. We need answers. We prefer bad answers to the lack of an answer. The court system will sometimes convict innocent people of terrible crimes, simply because they cannot stand for the crime to go unpunished, for the riddle to go unanswered. The answers offered by liberalism and conservatism, by stoicism and cynicism, are childish and uniformly ugly—but they are answers. They provide us with a way to walk up to a problem, pick an avenue and walk away from the problem; thus, they carry us away from the horrible problem of not knowing what to think.
That’s my philosophy. You’re welcome.
4. FLOPPING
Phil Taylor’s back-page column in Sports Illustrated, May 21, 2012 complains about basketball players "flopping" to draw charging calls. One of those insights that I absolutely can’t believe I never had before.. . ..other than when he is foul trouble, the rules of basketball create a powerful incentive for a player to do this. Think about it. If there is no foul on a possession, how many points per possession does a basketball team get? I really don’t know. . .I’m guessing 1.1 or something like that. (When there is an offensive rebound, I always count that as a continuation of the possession, rather than as a new possession; I am certain that from an analytical standpoint that is the right way to count them. You get a much lower figure if you figure it the other way.)
If the defender stands in front of a player and forces the referee to make a call, the call can go either way. If it is called against the defender, the points expected on the possession go up to something like 1.4, right? Probably a little less than 1.4, I don’t know; let’s say 1.40.
But if the foul is called against the man with the ball, the points expected go to zero. The potential GAIN on the play (from the standpoint of the defender) is 1.1 points. The potential loss is 0.3 points. So if the defender creates a 50/50 call, it’s a huge net plus for the defensive team.
I can’t believe that, having watched basketball all my life, I never realized that before. I realized that it was a net plus for the defender when the call went his way, of course, but what I never realized is that it is a net percentage plus for the defender when he creates the situation in which a foul must be called one way or the other. Even if the call goes against him 60% of the time, 70% of the time, it is still a net gain for his team.
In other words, I saw THIS relationship:
But I failed (until now) to see this one:
It is the SECOND relationship that drives coaches to teach flopping, not the first one. I’ve always despised Mike Krzyzewski, as any decent human being does, for many reasons, one of which that he teaches his team to create those situations for officials, and I dislike the flop. I dislike the flop—as Phil Taylor does—because it seems unmanly, sneaky, and unsportsmanlike.
It is all that, but it is unmanly, sneaky, unsportsmanlike and smart. Flopping isn’t a gamble; it’s an investment. If you create a 50/50 call for the official, you have gained something like 4/10ths of a point for your team (as long as you don’t put yourself in danger of fouling out.) That’s why they can’t get rid of that play; the rules create an incentive to do it.
If you really want to get rid of that play, you have to change the rules to re-balance the incentives against creating the situation in which a foul must be called. There are various ways you could do that. You could, for example, making "flopping" its own foul call, rather than a subset of "blocking"; blocking is one foul call, flopping is a different foul call. If the referee calls a flop, then the offensive team gets two free throws and gets the ball back, rather than just getting the free throws. Then the expected points per possession if you flop and the call goes against you go to 2.50, rather than 1.40; thus, the expected points per possession at the moment of contact (but before the foul is called) go to something like 1.25. That removes the incentive to flop and try to draw a foul call, and that solves the problem.