Post by Bruce BradburyHerman,
Thanks for the response. Addressing the first part of your comment
first. We are using z-scores for two main reasons.
Post by Bruce BradburyFirst, our research question is how much of the variation in y is
explained by x. So expressing the effect of a binary x in terms of
standard deviations in y seems most natural.
Post by Bruce BradburySecond, we are comparing data from two countries where our y variables
are composite scores (of cognitive ability) which are not measured
in precisely the same way (and indeed are usually reported in z-score
form). Data from other studies with different ability measures which are
defined the same in the two countries does suggest that the standard
deviation of ability is identical in the two countries - but I don't
think we need this assumption if we stick with the research question in
the previous paragraph.
Post by Bruce BradburyAs I write this, it strikes me that a test of whether b is zero should
be the same as whether R2 is zero (for which there is an F test) and
the latter should be the same on standardised or unstandardised y. So is
my question the same as asking whether we can put a confidence interval
on R2?
No, it is not the same. The difference is that the variance of y
depends on b, and unless one makes the assumption that x is normal,
the joint distribution is essentially impossible to calculate. You
do get more information about b by treating the x's as constants.
Post by Bruce BradburyYet another way of doing this might be to estimate a standard regression
on unstandardised y, then divide b by the standard deviation of y. Taylor
series expansion then suggests the standard error of the result will
be a function of the standard error of b, the standard error of s(y)
and the covariance between the two estimates. Does you statement about
orthoganality imply that the covariance will be zero?
I was generalizing incorrectly from a simpler problem when I stated
asymptotic orthogonality. Look at the estimates of b and s(y).
bhat=b+cov(x,e)/var(x), shat(y)^2 =bhat^2*var(x)+var(e); these are
sample variances and covariances. The two terms in the square of
the variance of x are asymptotically independent, but especially if
b is large bhat and shat(y) are positively dependent. This reduces
the error in b/s(y).
So consider bhat/shat(y). If we write the square of this as
qhat, where qhat^2 = 1 + var(e)/bhat^2, we find that the first
order terms in qhat-q are
[(var(e)-E(var(e))/b^2 - 2cov(x,e)/b^3]/2q.
How accurate this is for reasonable sized samples is difficult to decide,
but this is the first term in the expansion of the error.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
***@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558