Page 1. Statistical Inference. Second Edition. George Casella. Roger L. Berger. D U X B U R Y A D V A N C E D S E R I ES. Page 2. Page 3. Page 4. Page 5. Statistical inference / George Casella, Roger L. Berger. . the basics of probability, we develop the theory of statistical inference using .. Inverted gamma pdf. George Casella. Roger l. Berger Statistical inference / George Casella, Roger L. Bergernd ed. .. Histogram of exponential pdf.
|Language:||English, Spanish, Arabic|
|Genre:||Business & Career|
|Distribution:||Free* [*Registration needed]|
George Casella. University of Of the exercises in Statistical Inference, Second Edition, this manual a. f(x) is a pdf since it is positive and. ∫ ∞. − ∞. Statistical Inference-Second Edition (podmimokongist.tka-Berger). Alexander Villar Espinoza. A. Villar Espinoza. Loading Preview. Sorry, preview is currently unavailable. Solutions Manual for Statistical Inference, Second Edition George Casella Roger L. Berger University of Florida North Carolina State University Damaris.
Second Edition b. Proof of ii: Let B1 and B2 be the two sigma algebras. Otherwise the sum of the probabilities would be infinite. Thus there are 2n possible choices. There are n pieces on which the two numbers match. There are many ways to obtain this. Here is one. The denominator is nn because this is the number of ways to place n balls in n cells.
The numerator is the number of ways of placing the balls such that exactly one cell is empty. There are n 2 ways of picking the 2 balls to go into this cell. Think of the n variables as n bins. Differentiating with respect to one of the variables is equivalent to putting a ball in the bin.
Thus there are equally likely sample points. There are several different ways that the calls might be assigned so that there is at least one call each day. Denote this by There are 7 ways to specify the day with 6 calls.
There are 12 6 to specify which of the 12 calls are on this day. And there are 6!
We will now count another pattern. The number of sample points with this pattern is 7 12 4 8 6 2 2 2 4!. Here is a list of all the possibilities and the counts of the sample points for each one. Thus the probability is increasing in p, and the minimum ] p is at zero.
Thus, all pairs cancel and the sum is zero. This would complete the problem, since the desired limit is the exponential of this one. This is accomplished in an indirect way, by working with differences, which avoids dealing with the factorial.
Moreover, the partial sums must approach a limit. Unordered Ordered 1. Same as a. The number of or- 6! If the k objects were distinguishable then there would be k! Since we have k1 ,.
There are k1! Thus there would be k1! Think of the m distinct numbers as m bins. Selecting a sample of size k, with replacement, is the same as putting k balls in the m bins. Note that, to create all of the bootstrap samples, we do not need to know what the original sample was. We only need to know the sample size and the distinct values. The probability of obtaining the corresponding average of such outcome is n! See also Lemma 2. By Exercise 1. A, B and C are a partition.
This could be calculated directly, as in Example 1. Suppose A and B are mutually exclusive. Thus A and B cannot be independent.
Second Edition 1. The other arguments are similar. If all of the Ai are equal, all of the probabilities in the inclusion-exclusion identity are the same. Therefore B is the set of all subsets of X.
We must verify each of the three properties in Definition 1. There are 77 equally likely sample points. The possible values of X3 are 0, 1 and 2. The number of sample points that give each of these patterns is given below. Thus we only need to check the limit, and that they are nondecreasing a.
See Example 1. Thus is FY y right continuous. The probabilities are obtained by counting arguments, as follows. Thus for all y, FY is nondecreasing. Use Theorem 2. Let X be a random variable with density f x. Then by differentiating 2. Second Edition c. Theorem 2. If the sets B1 , B2 ,. So this says that we can apply Theorem 2. Note that on A1 we are essentially using Example 2. We prove part b , which is equivalent to part a. The assumptions that are needed are the ones listed in Theorem 2.
Pn Pn b. There are many examples; here are three. The standard normal pdf Example 2. The Cauchy pdf Example 2. The uniform 0, 1 pdf Example 2.
The standard normal pdf. The uniform on the interval 0, 1. For the case when the mode is unique. Let a be the point of symmetry and b be the mode. Thus a is the mode. For the case when the mode is not unique, there must exist an interval x1 , x2 such that f x has the same value in the whole interval, i.
Thus f x is unimodal and 0 is the mode. Second Edition 2.
As a graph will show, iii is most peaked, i is next, and ii is least peaked. The graph looks very similar to Figure 2. The mgf of f1 is eK1 t. The mgf of f2 is eK2 t. So the sample size must be at least The only occurrence in the first four seconds, for which the pedestrian does not wait the entire four seconds, is to have a car pass in the first second and no other car pass.
If the new and old drugs are equally effective, then the probability that the new drug is effective on a case is. Hence, we cannot conclude the new drug is better. Therefore, each theater should have at least seats, and the answer based on the approx- imation equals the exact answer.
Second Edition 3. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli 1 trials with probability of success a twin birth of approximately The probability of having 5 or more successes approximates the probability of having 5 or more sets of twins entering 1 kindergarten. Let X be the number of elementary schools in New York state that have 5 or more sets of twins entering kindergarten.
Let X be the number of States that have 5 or more sets of twins entering kindergarten during any of the last ten years.
Then X and Y have the specified binomial and hypergeometric distributions, respectively. VarXT b.
This will establish the formula. Use 3. Thus, by Exercise 2. Calculation of EY and EY 2 cannot be done in closed form. The uniform pdf satisfies the inequalities of Exercise 2. In Exercise 3. The pdf is symmetric about 0, so 0 must be the median. This is a special case of Exercise 3.
From Example 3. Chapter 4 Multiple Random Variables 4. The proof is the same as for Theorem 2. The given probabilities are obtained by noting the following equivalences of events. The way this integral is calculated depends on the values of x and y. The random variables A and B are independent uniform 1, 2 variables. This is not a cross-product set.
Therefore, U and V are not independent. Suppose the length of the stick is 1. Let X and Y denote the two points where the stick is broken. Let X and Y both have uniform 0, 1 distributions, and assume X and Y are independent.
Then the joint distribution of X and Y is uniform on the unit square. In order for the three pieces to form a triangle, the sum of the lengths of any two pieces must be greater than the length of the third.
Draw a graph of this set. The cross term can be shown to be zero by iterating the expectation. Equation 2. In Example 4. Since the joint pmf factors into a function of u and a function of v, U and V are independent. That is, r and s have no common factors. Second Edition 4. This transformation is not one-to-one because you cannot determine the sign of X2 from Y1 and Y2. From 4. We see in the above expression that the joint pdf factors into a function of y1 and a function of y2.
So Y1 and Y2 are independent. Y1 is the square of the distance from X1 , X2 to the origin. Y2 is the cosine of the angle between the positive x1 -axis and the line from X1 , X2 to the origin. So independence says the distance from the origin is independent of the orientation as measured by the angle. So Z and W are independent. It remains to show that they are independent. Proceed as in Exercise 4.
By Theorem 2. The probability of choosing between each one of these intervals is From the discussion in the text we have that f x1 ,. By Theorem 4. Then n! We will compute the marginal of X. The calculation for Y is similar. We will do this in the standard case. Thus, by part a , U is normal. Simply plug the expressions for aX , bX , etc.
In either case there are an infinite number of points satisfying the equations. So Z and Y always have the same sign, hence they cannot be bivariate normal. Thus X and Y are independent. By Example 4. Thus Ar is nondecreasing in Pnr. The following Mathematica code will draw the picture; the solid lines are B1 and the dashed lines are B2.
Note that the solid lines increase with x1, while the dashed lines are constant. Thus B1 is informative, as the range of X2 changes. But ez is linear on an interval only if the interval is a single point. Let a and b be real numbers. The case when g x and h x are both nonincreasing can be proved similarly.
Chapter 5 Properties of a Random Sample 5. Although all of the calculations here are straightforward, there is a tedious amount of book- keeping needed. It seems that induction is the easiest route. Let Sn denote the variance based on n observations. For each of these, the entire expectation is nonzero for only two values of k when k matches either i or j. Thus the correlation coefficient is equal to 1. Square the random variable in part b. Of course, many choices of y will do, but this one makes calculations easy.
The choice is prompted by the exponential term in the pdf. The random variable qFq,p can be thought of as the sum of q random variables, each a tp squared. Then, for general n we have P max X 1 ,. Then, from Theorem 5. So the joint pdf of Y1 ,. From Theorem 5. It can be checked that the product of these marginal pdfs is the joint pdf given above. Consider two cases, depending on which of i or j is greater.
Using the formulas from Theorems 5. From Example 5. For the exact calculations, use the fact that Vn is itself distributed negative binomial 10r, p. The results are summarized in the following table. Note that the recursion relation of problem 3. Notice that the continuity correction gives some improvement over the uncorrected normal approximation.
Since X1 ,. By i and ii the results follows. Using Mathematica, the exact calculation is In: The answer can also be simulated in Mathematica or in R.
By Lemma 5. Thus Y is the sum of independent gamma random variables. By V Exercise 4. See Example 2. In fact, all odd moments of X are 0. Thus, the first three moments of X all agree with the first three moments of a n 0, 1.
The fourth moment is not easy to get, one way to do it is to get the mgf of X. This is a lengthy calculation. The Metropolis Algorithm is used to generate variables. Among other options one can choose the variables in positions to or the ones in positions , , Now, follow the algorithm on page Chapter 6 Principles of Data Reduction 6. This is a difficult problem. The order statistics are a minimal sufficient statistic.
Second Edition e. Fix sample points x and y. The second term is constant on such an interval. This will be true for all such intervals if and only if the order statistics for x are the same as the order statistics for y.
Therefore, the order statistics are a minimal sufficient statistic. From Example 6. So, X 1 , X n is not complete. That provides the opportunity to construct an unbiased, nonzero estimator of zero. These are all location families. The last vector depends only on Z1 ,. For c , d , and e the order statistics are sufficient, so Y1 ,. For b , X 1 is sufficient. Then the joint pdf of Y1 ,. Use Theorem 6.
A minimal sufficient statistic may contain an ancillary statistic. Then Y1 and Y2 are iid and, by Theorem 2. Thus, by Theorem 3. Let M X denote the median calculated from X1 ,. This quadratic graph is a line and does not contain a two-dimensional open set. Use the same factorization as in Example 6. Thus, i Xi is a complete, sufficient statistic by Theorems 6. If the expectation exists, this is an analytic function which cannot be identically zero.
Second Edition 6. Hence the family is not complete. This is a polynomial of degree 2 in p. To make it zero for all p each coefficient must be zero. For d , the order statistics are minimal sufficient. This is a location family. Thus, by Example 6. So this sufficient statistic is not complete. X is sufficient because it is the data. So the family is not complete. X is sufficient by Theorem 6. By Example 6. Because i Xi is a one-to-one function of log i Xi , i Xi is also a complete sufficient statistic.
From Exercise 6. So if Z1 ,. Thus, as in Exercise 6. Do part b first showing that i Xi2 is a minimal sufficient statistic. Thus, i Xi , i Xi is a minimal sufficient statistic. By Theorem 6. Thus, X, statistic. This can be accomplished using the methods from Section 4. This is a two-to-one transformation. From Theorem 6. By Exercise 6. Because they are independent, by Theorem 4. Thus evidence is equal whenever the likelihood functions are equal, and this follows from Formal Sufficiency and Conditionality.
Equations 6. To prove the Conditionality Principle.
Now consider the Formal Sufficiency Principle. From the likelihood principle, inference about p is only through L p x. The values of the likelihood are 1, p, p2 , and p3 , and the sample size does not directly influence the inference. In each pair, Xi and Yi are independent, so W and V are independent.
Hence, W, V defined as in part ii is sufficient. Thus, we do not have equivariance.
Because X1 ,. The formal structures for the problem involving X and the problem involving Y are the same. They both concern a random sample of size n from a normal population and estimation of the mean of the population.
The distribution of X1 ,. An estimator of the form kS 2 is invariant because n n! G2 and G3 are both subgroups of G1. So invariance with respect to G1 implies invariance with respect to G2 and G3. The transformations in G2 leave the scale parameter unchanged. Second Edition An estimator of the given form is invariant if, for all a and x1 ,.
These values are in the following table.
But it is usually best to do as much as possible analytically, first, and perhaps reduce the complexity of the numerical P problem. Substitute this into L. Many computer programs can be used to maximize this function.
Then from Example 7. Thus k is the MLE. The roots are. Because it is the only place where the first derivative is zero, it is also a global maximum. But if n is small, the bias is quite large. In Example 7. Usually the midpoint of this interval is taken as the MLE. This is the same as Exercise 6. This involved algebra can be found in Schwarz and Samanta This is a special case of the computation in Exercise 7. The posterior distributions are just the normalized likelihood times prior, so of course they are different.
Two answers are provided. First, use the Miscellanea: For k! So the MLEs are the same as the method of moment estimators in part a. Therefore it is maximum. By Corollary 4. Second Edition 7. Let n a, b denote the pdf of a normal distribution with mean a and variance b.
This also completes part c. We will use the results and notation from part b to do this special case. The joint density is the product of the individual densities. Note that X 1 , X n is not a minimal sufficient statistic recall Exercise 5. Then maxi Xi is minimal sufficient. Both a and b are exponential families, and this condition is satisfied for all exponential families. Hence the estimator is unbiased. Therefore, we need to minimize i a2i , P P 2 2 P P b. This does not undermine the importance of mathematics and rigor; indeed, we have found that these have become more important.
But the manner in which they are applied is changing. For those familiar with the first edition, we can summarize the changes succinctly as follows. Discussion of asymptotic methods has been greatly expanded into its own chapter. There is more emphasis on computing and simulation see Section 5. We have de-emphasized the more specialized theoretical topics, such as equivariance and decision theory, and have restructured some material in Chapters for clarity. There are two things that we want to note.
First, with respect to computer algebra programs, although we believe that they are becoming increasingly valuable tools, we did not want to force them on the instructor who does not share that belief.
Thus, the treatment is "unobtrusive" in that it appears only in an appendix, with some hints throughout the book where it may be useful. Second, we have changed the numbering system to one that facilitates finding things. Now theorems, lemmas, examples, and definitions are numbered together; for example, Definition 7.
The first four chapters have received only minor changes. The previous coverage of invariance, which was in Chapters of the first edition, has been greatly reduced and incorporated int. Chapter 7 has been expanded and updated, and includes a new section on the EM algorithm. Chapter 8 has also received minor editing and updating, and ""'11 '- f. Unfortunately, coverage of randomized block designs has been eliminated for space reasons. Chapter 12 covers regression with errors-in-variables and contains new material on robust and logistic regression.
After teaching from the first edition for a number of years, we know approximately what can be covered in a one-year course. Finally, it is almost impossible to thank all of the people who have contributed in some way to making the second edition a reality and help us correct the mistakes in the first edition.
To all of our students, friends, and colleagues who took the time to send us a note or an e-mail, we thank you. A number of people made key suggestions that led to substantial changes in presentation. Sometimes these suggestions were just short notes or comments, and some were longer reviews. Some were so long ago that their authors may have forgotten, but we haven't. We also owe much to Jay Beder, who has sent us numerous comments and suggestions over the years and possibly knows the first edition better than we do, and to Michael Perlman and his class, who are sending comments and corrections even as we write this.
This book has seen a number of editors. We thank Alex Kugashev, who in the mids first suggested doing a second edition, and our editor, Carolyn Crockett, who constantly encouraged us. Perhaps the one person other than us who is most responsible for this book is our first editor, John Kimmel, who encouraged, published, and marketed the first edition.
Thanks, John. George Casella Roger L. The first is "Why are you writing a book? You are writing a book because you are not entirely satisfied with the available texts.
The second question is harder to answer. The answer can't be put in a few sentences so, in order not to bore your audience who may be asking the question only out of politeness , you try to say something quick and witty. It usually doesn't work. Logical development, proofs, ideas, themes, etc. When this endeavor was started, we were not sure how well it would work. The final judgment of our success is, of course, left to the reader.
The book is intended for first-year graduate students majoring in statistics or in a field where a statistics concentration is desirable. The prerequisite is one year of calculus. Some familiarity with matrix manipulations would be useful, but is not essential.