|
Team |
W |
L |
R |
RA |
|
Team |
W |
L |
R |
RA |
|
Anaheim |
82 |
80 |
864 |
869 |
|
Milwaukee |
73 |
89 |
740 |
826 |
|
Arizona |
85 |
77 |
792 |
754 |
|
Minnesota |
69 |
93 |
748 |
880 |
|
Atlanta |
95 |
67 |
810 |
714 |
|
Montreal |
67 |
95 |
738 |
902 |
|
Baltimore |
74 |
88 |
794 |
913 |
|
New
York_AL |
87 |
74 |
871 |
814 |
|
Boston |
85 |
77 |
792 |
745 |
|
New
York_NL |
94 |
68 |
807 |
738 |
|
Chicago_AL |
95 |
67 |
978 |
839 |
|
Oakland |
91 |
70 |
947 |
813 |
|
Chicago_NL |
65 |
97 |
764 |
904 |
|
Philadelphia |
65 |
97 |
708 |
830 |
|
Cincinnati |
85 |
77 |
825 |
765 |
|
Pittsburgh |
69 |
93 |
793 |
888 |
|
Cleveland |
90 |
72 |
950 |
816 |
|
San
Diego |
76 |
86 |
752 |
815 |
|
Colorado |
82 |
80 |
968 |
897 |
|
San
Francisco |
97 |
65 |
925 |
747 |
|
Detroit |
79 |
83 |
823 |
827 |
|
Seattle |
91 |
71 |
907 |
780 |
|
Florida |
79 |
82 |
731 |
797 |
|
St.
Louis |
95 |
67 |
887 |
771 |
|
Houston |
72 |
90 |
938 |
944 |
|
Tampa
Bay |
69 |
92 |
733 |
842 |
|
Kansas
City |
77 |
85 |
879 |
930 |
|
Texas |
71 |
91 |
848 |
974 |
|
Los
Angeles |
86 |
76 |
798 |
729 |
|
Toronto |
83 |
79 |
861 |
908 |
To look for the Pythagorean relationship, we compute log(W/L) and log(R/RA) for all teams and construct a scatterplot of the two quantities in Figure 4-8.

Figure 4-8: Scatterplot of log runs ratio against log of ratio of wins to losses for Major League team data from the 2000 season.
We see a linear positive association in this graph, indicating that there is indeed a linear association between log(W/L) and log(R/RA).
Next we want to fit a “best line” to this graph. It seems natural to restrict this line to pass through one point. If a team scores the same number of runs against its opponents (R = RA), then we expect the team to win half of its games (W = L). In other words, the point (log(R/RA), log(W/L)) = (0, 0) should fall on the line. With this restriction, we look at line fits of the form
log(W/L) = k log(R/RA).
We choose k by using a least-squares criterion. It turns out that the sum of squared residuals is minimized when k = 1.91. Figure 4-9 shows this best line on the scatterplot and a display of the corresponding residuals . We do not see any linear trend or any other pattern in the residual plot, so it appears that our fit is satisfactory.

Figure 4-9: Least-squares fit (top) and residual plot (bottom) for (R/RA, W/L) data.
which is pretty close to James’ Pythagorean relationship
which uses the power of 2. How
useful is this rule in predicting a team’s win numbers? To check the accuracy of this relationship in prediction,
Table 4-22 gives the actual number of wins, the predicted number of wins (using
the above model) and the residual (actual – predicted). Figure 4-10 displays a stemplot
of the absolute residuals
.
Table 4-22: Number
of wins, predicted number of wins, and residuals using James’ Pythagorean
relationship.
|
Team |
W |
predicted |
residual |
|
Team |
W |
predicted |
residual |
|
Anaheim |
82 |
80.6 |
1.4 |
|
Milwaukee |
73 |
72.5 |
0.5 |
|
Arizona |
85 |
84.8 |
0.2 |
|
Minnesota |
69 |
68.5 |
0.5 |
|
Atlanta |
95 |
90.7 |
4.3 |
|
Montreal |
67 |
65.6 |
1.4 |
|
Baltimore |
74 |
70.2 |
3.8 |
|
New
York_AL |
87 |
85.7 |
1.3 |
|
Boston |
85 |
85.7 |
-0.7 |
|
New
York_NL |
94 |
87.9 |
6.1 |
|
Chicago_AL |
95 |
92.8 |
2.2 |
|
Oakland |
91 |
92.2 |
-1.2 |
|
Chicago_NL |
65 |
68.1 |
-3.1 |
|
Philadelphia |
65 |
68.8 |
-3.8 |
|
Cincinnati |
85 |
86.8 |
-1.8 |
|
Pittsburgh |
69 |
72.3 |
-3.3 |
|
Cleveland |
90 |
92.7 |
-2.7 |
|
San
Diego |
76 |
74.8 |
1.2 |
|
Colorado |
82 |
86.9 |
-4.9 |
|
San
Francisco |
97 |
97.3 |
-0.3 |
|
Detroit |
79 |
80.6 |
-1.6 |
|
Seattle |
91 |
92.6 |
-1.6 |
|
Florida |
79 |
73.9 |
5.1 |
|
St.
Louis |
95 |
91.8 |
3.2 |
|
Houston |
72 |
80.5 |
-8.5 |
|
Tampa
Bay |
69 |
69.9 |
-0.9 |
|
Kansas
City |
77 |
76.6 |
0.4 |
|
Texas |
71 |
70.3 |
0.7 |
|
Los
Angeles |
86 |
88 |
-2 |
|
Toronto |
83 |
76.9 |
6.1 |
|
|
|
|
|
|
|
|
|
|
0 | 23455779
1 | 22344668
2 | 027
3 | 12388
4 | 39
5 | 1
6 | 11
7 |
8 | 5
Figure 4-10:
Stemplot of the abolute residuals from the fit using James’
Pythagorean relationship.
We see from the stemplot that 24 of the 30 residuals sizes are smaller than 4. This indicates that for 80% of the teams, we can predict the number of wins to within 4 games using this formula.