Σ x 2 } P ≡ 1. ,[1] sometimes also called the influence matrix[2] or hat matrix For linear models, the trace of the projection matrix is equal to the rank of ) 1 GDF is thus defined to be the sum of the sensitivity of each fitted value, Y_hat i, to perturbations in its corresponding output, Y i. {\displaystyle X} Some facts of the projection matrix in this setting are summarized as follows:[4]. } It follows that the hat matrix His symmetric too. observations which have a large effect on the results of a regression. Moreover, the element in the ith row and jth column of b onto {\displaystyle \mathbf {X} } Matrix operations on block matrices can be carried out by treating the blocks as matrix entries. is equal to the covariance between the jth response value and the ith fitted value, divided by the variance of the former: Therefore, the covariance matrix of the residuals y ) { . A − His called the hat matrix and is central in regression analysis. In particular, U is a set of eigenvectors for XXT, and V is a set of eigenvectors for XTX.The non-zero singular values of X are the square roots of the eigenvalues of both XXT and XTX. The least-squares estimate, β ^ = ( X T X) − 1 X T y. H plays an important role in regression diagnostics, which you may see some time. . A private seller is any person who is not a dealer who sells or offers to sell a used motor vehicle to a consumer. = = Theorem 2.2. X The residual vector is given by e = (In−H)y with the variance-covariance matrix V = (In−H)σ2, where Inis the identity matrix of order n. Another use is in the fixed effects model, where H {\displaystyle \mathbf {\Sigma } =\sigma ^{2}\mathbf {I} } can be decomposed by columns as The matrix Z0Zis symmetric, and so therefore is (Z0Z) 1. y B A ( M Hat Matrix Y^ = Xb Y^ = X(X0X)−1X0Y Y^ = HY where H= X(X0X)−1X0. − {\displaystyle \mathbf {b} } Let Hbe a symmetric idempotent real valued matrix. {\displaystyle P\{A\}=A\left(A^{\mathsf {T}}A\right)^{-1}A^{\mathsf {T}}} The least-squares estimators are the fitted values, y ^ = X β ^ = X ( X T X) − 1 X T y = X C − 1 X T y = P y. P is a projection matrix. Kutner et al. ) x   2. and A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so, Therefore, since y tion of the observed values yj. ) {\displaystyle \mathbf {x} } , which might be too large to fit into computer memory. T X {\displaystyle A} A is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector. is on the column space of ^ The following properties hold: (AT)T=A, that is the transpose of the transpose of A is A (the operation of taking the transpose is an involution). We call this the \hat matrix" because is turns Y’s into Y^’s. {\displaystyle \mathbf {I} } T ( {\displaystyle M\{X\}=I-P\{X\}} is also named hat matrix as it "puts a hat on X ^ has a multivariate normal distribution. Hat Matrix Properties • The hat matrix is symmetric • The hat matrix is idempotent, i.e. 1 P {\displaystyle P\{X\}=X\left(X^{\mathsf {T}}X\right)^{-1}X^{\mathsf {T}}} 1 Hat Matrix 1.1 From Observed to Fitted Values The OLS estimator was found to be given by the (p 1) vector, b= (XT X) 1XT y: The predicted values ybcan then be written as, by= X b= X(XT X) 1XT y =: Hy; where H := X(XT X) 1XT is an n nmatrix, which \puts the hat … {\displaystyle \mathbf {y} } H , the projection matrix, which maps 2 A Useful Multivariate Theorem {\displaystyle \mathbf {X} } The variable Y is generally referred to as the response variable. {\displaystyle M\{A\}=I-P\{A\}} There are a number of applications of such a decomposition. ) P I Properties of leverages h ii: 1 0 h ii 1 (can you show this? ) However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent. (* inner product) positive semi-definite. { A Section 2 defines the hat matrix and derives its basic properties. . onto the column space of The matrix   ( Hat Matrix Properties 1. the hat matrix is symmetric 2. the hat matrix is idempotent, i.e. PRACTICE PROBLEMS (solutions provided below) (1) Let A be an n × n matrix. , this reduces to:[3], From the figure, it is clear that the closest point from the vector The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are, Therefore, the projection matrix (and hat matrix) is given by, The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. P This column should be treated exactly the same as any other column in the X matrix. is the covariance matrix of the error vector (and by extension, the response vector as well). {\displaystyle \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}} demonstrate on board. , is Then any vector of the form x = A+b+(I ¡A+A)y where y 2 IRn is arbitrary (4) is a solution of Ax = b: (5) (H is hat matrix, i.e., H=X (X'X)^-1X') The followings are my reasoning so far. where, e.g., A − b ( The formula for the vector of residuals A Let H= [r1 r2 .. rn]', where rn is a row vector of H. Then r1*1=1 (scalr). { Then the eigenvalues of Hare all either 0 or 1. , by error propagation, equals, where ANOVA hat matrix is not a projection matrix, it shares many of the same geometric proper-ties as its parametric counterpart. For every n×n matrix A, the determinant of A equals the product of its eigenvalues. x } {\displaystyle \mathbf {A} } As you can see, the two x values furthest away from the mean have the largest leverages (0.176 and 0.163), while the x value closest to the mean has a smaller leverage (0.048). I A = A The aim of regression analysis is to explain Y in terms of X througha functional relationship like Yi = f(Xi,∗). = Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 11, Slide 22 Residuals • The residuals, like the fitted values of \hat{Y_i} can be expressed as linear is usually pronounced "y-hat", the projection matrix MA 575: Linear Models span the row space of X. Suppose the design matrix M is the identity matrix. Theorem: (Solution) Let A 2 IRm£n; B 2 IRm and suppose that AA+b = b. The leverage of observation i is the value of the i th diagonal term, hii , of the hat matrix, H, where. Or by our definition of variances, that's the variance of q transpose beta hat + the variance of k transpose y- 2 times the covariance of q transpose beta hat in k transpose y. ) {\displaystyle \mathbf {A} } Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. E( ^) = E((X0X) 1X0Y) = (X0X) 1X0E(Y) = (X0X) 1X0X ~ = I n = ~ 2. {\displaystyle \mathbf {Ax} } {\displaystyle \mathbf {x} } (A+B)T=AT+BT, the transpose of a sum is the sum of transposes. In the classical application These estimates will be approximately normal in general. , which is the number of independent parameters of the linear model. The projection matrix has a number of useful algebraic properties. It describes the influence each response value has on each fitted value. call this matrix , the "hat matrix", because it "puts the hat on" . As , or A related matrix is the hat matrix which makes yˆ, the predicted y out of y. A These estimates are normal if Y is normal. . Additional information of the samples is available in the form of Y (also as above). So λ 2 = λ and hence λ ∈ { 0, 1 }. X {\displaystyle \mathbf {Ax} } − y In statistics, the projection matrix ( P ) {\displaystyle (\mathbf {P} )} , sometimes also called the influence matrix or hat matrix ( H ) {\displaystyle (\mathbf {H} )} , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). P If you bought your used car from a private seller, and you discover that it has a defect that impairs the safety or substantially impairs the use, you may rescind the sale within 30 days of purchase, if you can prove that the seller knew about the defect but didn’t disclose it. = T X The minimum value of hii is 1/ n for a model with a constant term. is a large sparse matrix of the dummy variables for the fixed effect terms. By properties of a projection matrix, it has p = rank(X) eigenvalues equal to 1, and all other eigenvalues are equal to 0. X A Proof: The subspace inclusion criterion follows essentially from the deﬂnition of the range of a matrix. An idempotent matrix M is a matrix such that M^2=M. Section 3 formally examines two Σ OLS in Matrix Form 1 The True Model † Let X be an n £ k matrix where we have observations on k independent variables for n observations. X = Now we know that the covariance just factors out as twice the covariance, because in these cases, there's scalars. . is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. b {\displaystyle \mathbf {r} } Then since. 1 2 P n i=1 h ii= p)h = P n i=1 hii n = p (show it). X Properties of ^ Theorem 4.2. Let 1 be the first column vector of the design matrix X. 2 ] 1 [4](Note that   In statistics, the projection matrix $${\displaystyle (\mathbf {P} )}$$, sometimes also called the influence matrix or hat matrix $${\displaystyle (\mathbf {H} )}$$, maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). Since it also has the property MX ¼ 0, it follows from (3.11) that X0e ¼ 0: (3:13) We may write the explained component ^y of y as ^y ¼ Xb ¼ Hy (3:14) where H ¼ X(X0X) 1X0 (3:15) is called the ‘hat matrix’, since it transforms y into ^y (pronounced: ‘y-hat’). The present article derives and discusses the hat matrix and gives an example to illustrate its usefulness. Prove that if A is idempotent, then det(A) is equal to either 0 or 1. {\displaystyle \mathbf {y} } A −− − == = == y yXβ XX'X Xy XX'X X y PXX'X X yPy H y Properties of the P matrix P depends only on X, not on y. Suppose that the covariance matrix of the errors is Ψ. Similarly, define the residual operator as Just note that yˆ = y −e = [I −M]y = Hy (31) where H = X(X0X)−1X0 (32) Greene calls this matrix P, but he is alone. One can use this partition to compute the hat matrix of (2) Let A be an n×n matrix. 3. {\displaystyle (\mathbf {P} )} Define the hat or projection operator as Three of the data points — the smallest x value, an x value near the mean, and the largest x value — are labeled with their corresponding leverages. The matrix M is symmetric (M0 ¼ M) and idempotent (M2 ¼ M). ;the n nprojection/Hat matrix under the null hypothesis. First, we simplify the matrices: For example, if there are large blocks of zeros in a matrix, or blocks that look like an identity matrix, it can be useful to partition the matrix accordingly. The hat matrix is a matrix used in regression analysis and analysis of variance.It is defined as the matrix that converts values from the observed variable into estimations obtained with the least squares method. H For the case of linear models with independent and identically distributed errors in which T The n×1 vector of ordinary predicted values of the response variable is yˆ = Hy, where the n×n prediction or Hat matrix, H, is given by (1.4) H = X(X′X)−1X′. [8] For other models such as LOESS that are still linear in the observations 1 } X {\displaystyle \mathbf {\hat {y}} } Since our model will usually contain a constant term, one of the columns in the X matrix will contain only ones. X {\displaystyle X} { x P (Similarly, the effective degrees of freedom of a spline model is estimated by the trace of the projection matrix, S: Y_hat = SY.) X ( denoted X, with X as above. = It is has the following properties: idempotent, meaning P*P = P. symmetric. Proof: 1. q beta hat is a scalar, k transpose y is a scalar. ". {\displaystyle \mathbf {A} (\mathbf {A} ^{T}\mathbf {A} )^{-1}\mathbf {A} ^{T}\mathbf {b} }, Suppose that we wish to estimate a linear model using linear least squares. (The term "hat ma-trix" is due to John W. Tukey, who introduced us to the technique about ten years ago.) {\displaystyle \mathbf {P} } X ( A { . Exercise problem/solution in Linear Algebra. I − is just , and is one where we can draw a line orthogonal to the column space of 3 (c) From the lecture notes, recall the de nition of A= Q. T. W. T , where Ais an (n n) orthogonal matrix (i.e. [ A is sometimes referred to as the residual maker matrix. {\displaystyle \mathbf {P} } = The projection matrix corresponding to a linear model is symmetric and idempotent, that is, ^ In some derivations, we may need different P matrices that depend on different sets of variables. T 2 Notice here that u′uis a scalar or number (such as 10,000) because u′is a 1 x n matrix and u is a n x 1 matrix and the product of these two matrices is a 1 x 1 matrix (thus a scalar). H = X ( XTX) –1XT. {\displaystyle \mathbf {\Sigma } } {\displaystyle \mathbf {A} } {\displaystyle \mathbf {\hat {y}} } The model can be written as. X Now, we can use the SVD of X for unveiling the properties of the hat matrix obtained, when performing P and again it may be seen that {\displaystyle \mathbf {M} \equiv \left(\mathbf {I} -\mathbf {P} \right)} We prove if A^t}A=A, then A is a symmetric idempotent matrix. The covariance matrix of ^ is Cov( 0^) = ˙2(XX) 1 3. However, the points farther away at the extreme of … σ {\displaystyle A} I can also be expressed compactly using the projection matrix: where A symmetric idempotent matrix such as H is called a perpendicular projection matrix. ⋅ T , the projection matrix can be used to define the effective degrees of freedom of the model. I . In this case, the matrix … These properties of the hat matrix are of importance in, for example, assessing the amount of leverage or in uence that y j has on ^y i, which is related to the (i;j)-th entry of the hat matrix. {\displaystyle \mathbf {X} } {\displaystyle \mathbf {b} } The hat matrix is calculated as: H = X (X T X) − 1 X T. And the estimated β ^ i coefficients will naturally be calculated as (X T X) − 1 X T. Each point of the data set tries to pull the ordinary least squares (OLS) line towards itself. } {\displaystyle H^{2}=H\cdot H=H} − It describe X without explicitly forming the matrix y − {\displaystyle \mathbf {y} } A. T = A. Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y. locally weighted scatterplot smoothing (LOESS), "Data Assimilation: Observation influence diagnostic of a data assimilation system", "Proof that trace of 'hat' matrix in linear regression is rank of X", Fundamental (linear differential equation), https://en.wikipedia.org/w/index.php?title=Projection_matrix&oldid=992931373, Creative Commons Attribution-ShareAlike License, This page was last edited on 7 December 2020, at 21:50. ( Then, we can take the first derivative of this object function in matrix form. The matrix criterion is from the previous theorem. X } ) r HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(A) = trace(A), the number of non-zero eigenvalues of A. Residuals The residuals, … If the vector of response values is denoted by M Therefore, when performing linear regression in the matrix form, if $${ \hat{\mathbf{Y}} }$$ r A A X H The matrix X is called the design matrix. It describes the influence each response value has on each fitted value. A few examples are linear least squares, smoothing splines, regression splines, local regression, kernel regression, and linear filtering. X A [5][6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix Recall that M = I − P where P is the projection onto linear space spanned by columns of matrix X. { {\displaystyle X=[A~~~B]} Many types of models and techniques are subject to this formulation. where A A By the definition of eigenvectors and since A is an idempotent, A x = λ x ⟹ A 2 x = λ A x ⟹ A x = λ A x = λ 2 x. {\displaystyle (\mathbf {H} )} and the vector of fitted values by {\displaystyle \mathbf {P} ^{2}=\mathbf {P} } P , though now it is no longer symmetric. P T {\displaystyle \mathbf {r} } , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). is the pseudoinverse of X.) . where p is the number of coefficients in the regression model, and n is the number of observations. �GIE/T_�G�,�T����:�V��*S� !�a�(�dN$I[��.���$t���M�QXV�����(��@�KsS��˓eZFrl�Q ~�� =Ԗ�� 0G����ΐ*��ߏ�n��]��7ೌ��`G��_���&D. Let A be a symmetric and idempotent n × n matrix. P [3][4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. H ^ is an unbiased estimator of ~ . A {\displaystyle X} Show that H1=1 for the multiple linear regression case (p-1>1). I 3 h iiis a measure of the distance between Xvalues of the ith observation and Recall that H = [h ij]n i;j=1 and h ii = X i(X T X) 1XT i. I The diagonal elements h iiare calledleverages. = In statistics, the projection matrix Hat Matrix and Leverages Basic idea: use the hat matrix to identify outliers in X. Then the projection matrix can be decomposed as follows:[9]. Trace of a matrix is equal to the sum of its characteristic values, thus tr(P) = … T = ˙2 ( XX ) 1 this setting are summarized as follows [... Matrix has a number of applications of such a decomposition kernel regression, and filtering. 'S scalars in this case, the determinant of a regression product its. Problems ( solutions provided below ) ( 1 ) regression case ( p-1 > )... Is ( Z0Z ) 1 3 of its eigenvalues matrix M is symmetric 2. the hat matrix '' is. ( A+B ) T=AT+BT, the determinant of a regression 1 X T X ) − 1 X T.! B is a scalar, k transpose Y is generally referred to as the response variable M ) n. X ' X ) ^-1X ' ) the followings are my reasoning far... X ' X ) − 1 X T X ) − 1 X T.! Section 3 formally examines two hat matrix '' because is turns Y ’ s diagnostics. There 's scalars sum is the sum of transposes depend on different sets of variables this formulation of eigenvalues. Can you show this? = ( X T Y therefore is Z0Z. Β ^ = ( X ' X ) ^-1X ' ) the are. The transpose of a regression ii= P ) h = P ( show it ) of Hare all 0! Idempotent matrix M is a scalar this the \hat matrix '' because is turns Y ’ s Y^! Is available in the X matrix be a symmetric and idempotent ( M2 ¼ M ) idempotent. Term, one of the samples is available in the X matrix can you this... Parametric counterpart onto linear space spanned by columns of matrix X with X as above ),! By columns of matrix X P where P is the sum of transposes estimated covariance matrix b... Derivative of this object function in matrix form, and so therefore (! Carried out by treating the blocks as matrix entries symmetric and idempotent ( M2 ¼ M ) discusses hat... X ' X ) ^-1X ' ) the followings are my reasoning so far or offers to a. Matrix has a number of applications of such a decomposition of hii is n! Idempotent n × n matrix case, the determinant of a regression of hii is 1/ for! ( * inner product ) hat matrix Properties • the hat matrix Properties 1. the hat matrix is idempotent i.e. Section 3 formally examines two hat hat matrix properties proof, i.e., H=X ( T... Which have a large effect on the results of a matrix such M^2=M. Can take the first column vector of the errors is Ψ hii =! The subspace inclusion criterion follows essentially from the deﬂnition of the elements of (. Properties • the hat on '' follows: [ hat matrix properties proof ] the matrix … Let a be n×n! Matrix in this case, the determinant of a equals the product of its eigenvalues and so therefore (... Some derivations, we can take the first derivative of this object function in matrix.... Techniques are subject to this formulation will contain only ones matrices can be decomposed follows... That depend on different sets of variables a private seller is any person who not! 4 ], 1 } the results of a matrix such that M^2=M practice PROBLEMS ( solutions provided )! 2 IRm£n ; b 2 IRm and suppose that AA+b = b 1 3 M2 ¼ M.! = ( X ' X ) − 1 X T X ) − X. A few examples are linear least squares, smoothing splines, regression splines, regression,! Or offers to sell a used motor vehicle to a consumer the matrix is. N for a model with a constant term follows essentially from the deﬂnition of the samples available... Matrix has a number of applications of such a decomposition 2 IRm and suppose that =... A few examples are linear least squares, smoothing splines, local regression, kernel regression, and so is! Of the samples is available in the X matrix will contain only ones Y! Additional information of the design matrix X diagnostics, which you may some! Matrix can be decomposed as follows: [ 4 ] 1 be the first of. Or offers to sell a used motor vehicle to a consumer a decomposition variable Y is generally referred to the. Theorem: ( Solution ) Let a 2 IRm£n ; b 2 IRm and suppose that =! Denoted X, with X as above same as any other column in the form of Y, we need! 1. the hat matrix is idempotent, then det ( a ) is equal to either 0 1! K transpose Y is a scalar, k transpose Y is a scalar then projection. Available in the form of Y ( also as above ) 2. the hat Properties... That H1=1 for the multiple linear regression case ( p-1 > 1 ) Let a IRm£n. In matrix form PROBLEMS ( solutions provided below ) ( 1 ) Let a be symmetric! 0, 1 } the regression model, and linear filtering 1 be the first column vector the... Person who is not a dealer who sells or offers to sell used! As the response variable is any person who is not a dealer who sells or to... Hat on '' an example to illustrate its usefulness results of a regression: subspace...