Wednesday, August 29, 2007

Hitting

Contrary to popular opinion, I am not a math wiz. I only took one math class in college. Naturally, I went into a field that required a lot of statistics. In preparation, I took a stats class at night before I started grad school. I enjoyed the class and one of the first things we covered was basic descriptive statistics. Descriptive stats, in case you don't already know, summarize data; they are not inferential or analytical. Most people are familiar with these descriptive stats: mean (average), median (the halfway point in a series of numbers), and mode (the most frequently occurring number). We also learned some simple graphical ways of representing numbers, including box plots, histograms and stem and leaf plots, which help show patterns.

I knew I was going to like my stats class when one of the very first examples in our book used a baseball example. It compared the home run hitting careers of Babe Ruth and Roger Maris. Maris broke Ruth's single season home run record in 1961 but Ruth was a better home run hitter overall. Here are the stem and leaf plots:

Babe Ruth
0 | 2346
1 | 1
2 | 259
3 | 45
4 | 1166679
5 | 449
6 | 0
Total: 714


Roger Maris
0 | 589
1 | 346
2 | 368
3 | 39
4 |
5 |
6 | 1
Total: 275

The stem and leaf plot is simple, but telling. The stem is the first part of the number and the stem is the second part. (For a good description, click here.) Basically, it shows the shape and distribution of the number of home runs each player hit. It's not plotted against time or anything, but for most players their first and last years are in the low numbers the "05" and "13" years. What the plots tell you is that Babe Ruth was the far more consistent HR hitter. Maris can't touch him. Recently, I got to wondering what the stem and leaf plots of the other single season home run (near) record holders would look like compared to the life time home run (near) record holders.

First, there's Mark McGwire. He broke Maris's record in 1998 when he hit 70 home runs. How much of an outlier was his best HR season? Not much, but he's still no Babe Ruth.
0 | 399
1 |
2 | 29
3 | 22399
4 | 29
5 | 28
6 | 5
7 | 0
Total: 583

Sammy Sosa was McGwire's competition for the crown in 1998. Sosa is still playing and McGwire retired in 2005, but the two of them are well matched.
0 | 48
1 | 0456
2 | 5
3 | 3566
4 | 009
5 | 0
6 | 346
Total: 604 (and counting)

Let's now look at Hammering Hank Aaron. His consistency is astonishing. Eight years with 40 plus home runs? Damn. He broke Ruth's HR record by 41. Ruth had seven years were he hit in the 40s and four where he hit 50 plus, but it's consistency that wins this race. That, and a very long career.
1 | 023
2 | 04679
3 | 024489
4 | 00444457
Total: 755

Last, Barry Bonds. He now holds both the single season HR record and the career total record. The last person to do that was Babe Ruth. Look Bonds did it: seven years in the 40s and the one year of hitting 73 (which really is an outlier). It's all about consistency. Of all these hitters, whose pattern is closest to Aaron's? It's Bonds. Like him or not, he achieved this record much like Aaron. And in one fewer season (though I believe the seasons are longer now, so maybe it's an equivalent number of games).
0 | 5
1 | 69
2 | 45556
3 | 334477
4 | 0255669
5 |
6 |
7 | 3

Total: 760 (and counting)

For some context, here are the top ten all time home run hitters:
Rank Player (age) Home Runs Bats
1. Barry Bonds* (42) 760 L
2. Hank Aaron+  755 R
3. Babe Ruth+ 714 L
4. Willie Mays+  660 R
5. Sammy Sosa (38) 604 R
6. Ken Griffey* (37) 590 L
7. Frank Robinson+  586 R
8. Mark McGwire  583 R
9. Harmon Killebrew+  573 R
10. Rafael Palmeiro 569 L

You can find the full list here.

Grateful for: baseball.

No comments:

Post a Comment

Anonymous comments will be rejected. You don't have to use your real name, just A name. No URL is required; enter your name and leave the 'url' line blank. Thank you.