TEACHING STATISTICS USING BASEBALL SOLUTIONS TO EXERCISES - CHAPTER 6 ****************** Exercise 6.1 Number of Count Prob Expected P residual home runs hit ------------------------------------------------------ 0 87 0.6127 90.0700 0.1046 1 50 0.3001 44.1200 0.7836 2 8 0.0735 10.8000 0.7259 3 2 0.0120 1.7600 0.0327 Since the Pearson residuals are relatively small, it seems that the Poisson distribution is a good fit in this case. ****************** Exercise 6.3 (a) Binomial probabilities with n = 3 and p = .5679. w 0 1 2 3 ----------------------------------------------------- probability 0.0807 0.3181 0.4181 0.1832 (b) w 0 1 2 3 ----------------------------------------------------- expected 4.3566 17.1773 22.5758 9.8903 (c) Comparing the observed and expected counts, the binomial seems to be a good fit to these data. ****************** Exercise 6.5 Fitting a geometric model to run scoring data: x count probability expected --------------------------------------------- 1.0000 39.0000 0.2970 44.5500 2.0000 39.0000 0.2088 31.3187 3.0000 22.0000 0.1468 22.0170 4.0000 14.0000 0.1032 15.4780 5.0000 9.0000 0.0725 10.8810 6.0000 4.0000 0.0510 7.6493 7.0000 9.0000 0.0358 5.3775 8.0000 5.0000 0.0252 3.7804 9.0000 4.0000 0.0177 2.6576 10.0000 2.0000 0.0125 1.8683 11.0000 2.0000 0.0088 1.3134 Comparing the observed and expected counts, the model fits pretty well. The only discrepancy is we observe fewer first half-inning and more second half-inning counts. This might be due to a home-team advantage. ****************** Exercise 6.7 runs scored observed expected residual --------------------------------------------------- 0 1687 1673 .12 1 372 358 .58 2 179 192 .88 3 90 103 1.69 4 40 43 .24 5 21 22 .02 6 5 7 .67 7 1 5 3.01 8 3 0 9 2 0 This model seems reasonably good in predicting the number of runs scored in an inning. The expected number of 3-run innings is a little high, but I see close agreement of the observed and expected columns. ****************** Exercise 6.9 (a) On-base profiles for the two teams: f0 f1 f2 f3 f4 COL 0.2351 0.4899 0.1490 0.0281 0.0980 TBD 0.2423 0.5170 0.1652 0.0112 0.0643 (b) The probability that a team scores two runs with two base runners is f0*f4+f1*f2+f1*f3+f1*f4+f2*f1+f2*f2+f2*f3+f2*f4+f3*(f1+f2+f3+f4)+f4*(f0+f1+f2+f3+f4) Using this formula, we get COL prob = 0.3912 TBD prob = 0.3380 (c) The on-base probabilities for COL and TBD are .354 and .320. The number of runners per inning is a negative binomial distribution with r = 3 and p = 1 - OBP. The mean of this distribution is r / p. Applying this formula, we get COL avg runners on base = 4.64 TBD avg runners on base = 4.41 (d) Need players that are good in getting on-base and those that are good in advancing runners already on base. ******************