Discussion:
Multivariate Research Design Help - MANOVA a good option?
(too old to reply)
D.
2005-10-31 04:08:11 UTC
Permalink
I have been teaching myself different multivariate statistical
techniques over the past few months to try to get a viable method to
use on my dataset, and am still somewhat confused.

I gathered data on about 200 people. My independent variables are
genotype at two genes (each genotype can be considered a binary
variable with two roughly equal sized groupings), gender, and season of
birth (a binary variable separating the year into halves). The
dependent variables of interest are about 15 continuous psychological
scales (based upon past research that derives these scales from factor
analysis of many other individual questions), and about 4 other
continuous variables that I am interested in such as at what age
subjects think they will die at. The dependents are often significantly
correlated with each other.

The independents are expected to have small and perhaps interactive
effects on the dependents. The analysis is meant to be exploratory. I
expect little to none of my "significant" results to hold up to
corrections for multiple testing.

I considered DFA or logistic regression using one of the binary
independent variables as a pseudo-dependent variable. The
pseudo-independent continuous variables would then be ranked as to
which best distinguish between the binary pseudo-dependent. I could
solve the problem of multicollinearity by doing a PCA on the continuous
variables. I decided against this method because its method of flipping
the dependent/independent relationship on its head is dubious, factors
found significant in DFA would have to be deconstructed to understand
what they are saying, and further analysis would need to be done to
evaluate interactions between the binary dependents.

MANOVA seems like a good alternative. It allows interaction effects and
has a fairly straightforward interpretation. However I have some
concerns:

-For my interpretation I plan to report the results for each
multivariate main effect and the univariate individual effects
regardless of if the main effect is significant. I realize a common
technique is to only proceed to the individual effects if the main
effect is significant. Is my plan acceptable in an exploratory
analysis?

-The effect of multicollinear dependents on a model is ambiguous. Some
say that correlated dependents are a serious problem
(http://www.matforsk.no/ola/ffmanova.htm), while others present a more
ambiguous case (How the Power of MANOVA Can Both Increase and Decrease
as a Function of the Intercorrelations Among the Dependent Variables.
Cole, David A.1; Maxwell, Scott E.1; Arvey, Richard2; Salas, Eduardo3,
Psychological Bulletin. Vol 115 (3), May 1994, pp. 465-474). Fooling
around with my model so far, I find that changing the number of
independents and dependents in the model changes my P values some, but
not a ton. How much should I be worrying about this assumption?

Having four interacting binary independents with N=200 causes some
major stratification. I've read that no cell in the analysis should
have an N=20, or alternatively that the minimum N of the lowest cell
should not be outnumbered by the number of dependent variables. I may
lower my number of dependents to fit the latter rule if it is correct.


Is MANOVA a good option for my needs? Would I be better off doing one,
two and three way ANOVAs and the nonparametric equivalents
individually?

Thanks for any comments you can provide.

.d
Paige Miller
2005-10-31 13:52:32 UTC
Permalink
Post by D.
I have been teaching myself different multivariate statistical
techniques over the past few months to try to get a viable method to
use on my dataset, and am still somewhat confused.
I gathered data on about 200 people. My independent variables are
genotype at two genes (each genotype can be considered a binary
variable with two roughly equal sized groupings), gender, and season of
birth (a binary variable separating the year into halves). The
dependent variables of interest are about 15 continuous psychological
scales (based upon past research that derives these scales from factor
analysis of many other individual questions), and about 4 other
continuous variables that I am interested in such as at what age
subjects think they will die at. The dependents are often significantly
correlated with each other.
The independents are expected to have small and perhaps interactive
effects on the dependents. The analysis is meant to be exploratory. I
expect little to none of my "significant" results to hold up to
corrections for multiple testing.
The analysis is meant to be exploratory! Great, then most of these
other things that you are concerned about really don't amount to much.
Go ahead and explore!
Post by D.
I considered DFA or logistic regression using one of the binary
independent variables as a pseudo-dependent variable. The
pseudo-independent continuous variables would then be ranked as to
which best distinguish between the binary pseudo-dependent. I could
solve the problem of multicollinearity by doing a PCA on the continuous
variables. I decided against this method because its method of flipping
the dependent/independent relationship on its head is dubious, factors
found significant in DFA would have to be deconstructed to understand
what they are saying, and further analysis would need to be done to
evaluate interactions between the binary dependents.
I don't understand taking an independent variable and handling it as a
"pseudo-dependent variable".

The problem with doing PCA on the Y variables is that it finds
directions in these variables that may or may not be the ones that are
related to the changes in the X variables. Not recommended by me.
Post by D.
MANOVA seems like a good alternative. It allows interaction effects and
has a fairly straightforward interpretation. However I have some
So far, MANOVA seems PERFECT for this data, although you haven't told
us an important detail -- are the errors in Y multivariate normally
distributed and independent from observation to observation? But other
than that, MANOVA was specifically designed for the case of multiple
correlated Y variables and categorical/class X variables.
Post by D.
-For my interpretation I plan to report the results for each
multivariate main effect and the univariate individual effects
regardless of if the main effect is significant. I realize a common
technique is to only proceed to the individual effects if the main
effect is significant. Is my plan acceptable in an exploratory
analysis?
I explore like this all the time. I don't do confirmatory analysis like
this.
Post by D.
-The effect of multicollinear dependents on a model is ambiguous. Some
say that correlated dependents are a serious problem
(http://www.matforsk.no/ola/ffmanova.htm), while others present a more
ambiguous case (How the Power of MANOVA Can Both Increase and Decrease
as a Function of the Intercorrelations Among the Dependent Variables.
Cole, David A.1; Maxwell, Scott E.1; Arvey, Richard2; Salas, Eduardo3,
Psychological Bulletin. Vol 115 (3), May 1994, pp. 465-474). Fooling
around with my model so far, I find that changing the number of
independents and dependents in the model changes my P values some, but
not a ton. How much should I be worrying about this assumption?
I can see why the authors at matforsk.no were worried about correlated
dependent variables -- they are doing spectroscopy and have many many
many (much more than 14) HIGHLY (as in very very very HIGHLY, which is
typical for spectroscopy) correlated Y variables. I doubt you are in
such an extreme situation.
Post by D.
Having four interacting binary independents with N=200 causes some
major stratification. I've read that no cell in the analysis should
have an N=20, or alternatively that the minimum N of the lowest cell
should not be outnumbered by the number of dependent variables. I may
lower my number of dependents to fit the latter rule if it is correct.
I think rules like this are baloney. I think they apply in one
application and not another where the signal to noise ratios might be
very different. And furthermore, I think if you wrote things properly,
and the rule is advising you that you have TOO MANY data points or TOO
MANY variables, well, there's some misconceptions somewhere. The
problem with TOO MANY data points is that it can make any statistical
test significant, although the practical significance may be absent.
For exploratory data analysis, do not worry about this. And you can't
have too many variables -- you have the variables you have because that
is what the problem requires you to have.
Post by D.
Is MANOVA a good option for my needs?
I would certainly start with MANOVA

And those are my opinionated opinions.

--
Paige Miller
***@itt.com
D.
2005-11-01 02:29:24 UTC
Permalink
Thanks.

Well, it is meant to be exploratory, but also published.
Post by Paige Miller
I explore like this all the time. I don't do confirmatory analysis like
this.
How would you go about doing a confirmatory analysis assuming you don't
gather more data?
Post by Paige Miller
So far, MANOVA seems PERFECT for this data, although you haven't told
us an important detail -- are the errors in Y multivariate normally
distributed and independent from observation to observation?
By independent I presume you mean that I am not asking the same
question twice to the same subject with different "treatments"? If so,
I am ok on that count.

I am a bit confused about multivariate normality in MANOVAs (and
univariate normality in terms of ANOVA for that matter). Much of what I
read seems to indicate that the statistical tests for normality are far
to sensitive, and that the ANOVA/MANOVAs are actually very robust for
non-parametric distributions. The recommendations I seem to get are to
take a good look at the plots, paying special attention to outliers and
to make the subjective judgment myself. I am uncomfortable with this,
but the other alternatives don't seem much more reasonable.
For example this post seems to poo-poo the idea of testing for
multivariate normality:
http://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/ca3ef66d46dd531/8d70da1551295cc2?lnk=st&q=%22multivariate+normality%22&rnum=1#8d70da1551295cc2

This webpage presents the Box M test, but then states in that ""MANOVA
and MANCOVA assume that for each group (each cell in the factor design
matrix) the covariance matrix is similar. Box's M tests this
assumption. We want M not to be significant in order to conclude there
is insufficient evidence that the covariance matrices differ. Here M is
significant, so we have violated an assumption. That is, the various
music groups differ in their covariance matrices. Note, however, that
the F test is quite robust even when there are departures from this
assumption."

I am having trouble sifting through this conflicting info. Any
recommendations you can offer, especially if they are tests I can do in
SPSS would be appreciated.

Best,
Dan
Paige Miller
2005-11-01 13:13:17 UTC
Permalink
Post by D.
Thanks.
Well, it is meant to be exploratory, but also published.
Post by Paige Miller
I explore like this all the time. I don't do confirmatory analysis like
this.
How would you go about doing a confirmatory analysis assuming you don't
gather more data?
I think I'm guilty of using "confirmatory" in a very loose sense. I
used the word to mean times when you state the hypotheses you want to
test before you ever look at the data; you validate your assumptions;
and so on -- a much more formal use of statistical methods.
Post by D.
Post by Paige Miller
So far, MANOVA seems PERFECT for this data, although you haven't told
us an important detail -- are the errors in Y multivariate normally
distributed and independent from observation to observation?
By independent I presume you mean that I am not asking the same
question twice to the same subject with different "treatments"? If so,
I am ok on that count.
Errors should be iid multivariate normal, to use the textbook lingo.
The errors should be independent from observation to observation.
Post by D.
I am a bit confused about multivariate normality in MANOVAs (and
univariate normality in terms of ANOVA for that matter). Much of what I
read seems to indicate that the statistical tests for normality are far
to sensitive, and that the ANOVA/MANOVAs are actually very robust for
non-parametric distributions.
I have read those same things. Generally, I don't test for normality.
Post by D.
The recommendations I seem to get are to
take a good look at the plots, paying special attention to outliers and
to make the subjective judgment myself.
This is very often what I do. However, if you will be publishing a
paper, that might not suffice.
Post by D.
I am uncomfortable with this,
but the other alternatives don't seem much more reasonable.
For example this post seems to poo-poo the idea of testing for
http://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/ca3ef66d46dd531/8d70da1551295cc2?lnk=st&q=%22multivariate+normality%22&rnum=1#8d70da1551295cc2
This webpage presents the Box M test, but then states in that ""MANOVA
and MANCOVA assume that for each group (each cell in the factor design
matrix) the covariance matrix is similar. Box's M tests this
assumption. We want M not to be significant in order to conclude there
is insufficient evidence that the covariance matrices differ. Here M is
significant, so we have violated an assumption. That is, the various
music groups differ in their covariance matrices. Note, however, that
the F test is quite robust even when there are departures from this
assumption."
As before, theoretically (and to get papers published) you can't have
different covariance matrices in each cell. In practice, you can get
away with some differences, although I have no clue how to tell
(visually or otherwise) how robust the test would be to variation of
covariance matrices.
Post by D.
I am having trouble sifting through this conflicting info. Any
recommendations you can offer, especially if they are tests I can do in
SPSS would be appreciated.
There's a saying that goes something like this: if you ask ten
statisticians a question, you will get eleven answers. That seems to be
the problem you are having right now. My recommendation is to go ahead
and do something (MANOVA being the best choice I can think of right
now). Definitely plot your data, and plot the MANOVA results as well
(think biplots here). You will see the results and get a feel for
whether or not they make sense. I know that's not strictly kosher, but
then again, you are exploring! If people criticise you that you didn't
do things properly, that MANOVA isn't right, use that as an opportunity
to learn.

I don't know SPSS, so I can't help you there.

--
Paige Miller
***@itt.com
Øyvind Langsrud
2005-11-02 14:22:36 UTC
Permalink
Post by Paige Miller
Post by D.
-The effect of multicollinear dependents on a model is ambiguous. Some
say that correlated dependents are a serious problem
(http://www.matforsk.no/ola/ffmanova.htm), while others present a more
I can see why the authors at matforsk.no were worried about correlated
dependent variables -- they are doing spectroscopy and have many many
many (much more than 14) HIGHLY (as in very very very HIGHLY, which is
typical for spectroscopy) correlated Y variables. I doubt you are in
such an extreme situation.
I am on of the "authors at matforsk.no" and the 50-50 MANOVA method at
http://www.matforsk.no/ola/ffmanova.htm is useful not only for "very
very very HIGHLY" correlated responses.

Very often the physical or psychological phenomenon that is measured is
(in practice) of a lower dimension than the number of measured
variables. Methodology (such as 50-50 MANOVA) that incorporates
dimension reduction can therefore be very useful.
Post by Paige Miller
Post by D.
birth (a binary variable separating the year into halves). The
dependent variables of interest are about 15 continuous psychological
scales (based upon past research that derives these scales from factor
analysis of many other individual questions), and about 4 other
I see that Dan has already made a type of dimension reduction based on
factor analysis. So if 15 is "the correct dimension" then ordinary
MANOVA can be a good choice. But it is still probable that an analysis
that focuses on the most important dimensions can be more powerful.

If you want to try 50-50 MANOVA, free software is available at
http://www.matforsk.no/ola/program.htm. In addition to doing this
modified MANOVA, the program can also correct individual ANOVA's for
multiple testing in a non-conservative way (by using rotation testing).

Øyvind Langsrud
Richard Ulrich
2005-11-03 06:07:44 UTC
Permalink
I'm going back to this initial post, to disagree some with
Paige's advice. I agree that under the aegis of "exploratory",
almost anything is possible. I think it is nice for people to
look at various analyses to get accustomed to how they work.

But I don't think this is a very good strategy here, for learning
much.
Post by D.
I have been teaching myself different multivariate statistical
techniques over the past few months to try to get a viable method to
use on my dataset, and am still somewhat confused.
I gathered data on about 200 people. My independent variables are
genotype at two genes (each genotype can be considered a binary
variable with two roughly equal sized groupings), gender, and season of
birth (a binary variable separating the year into halves). The
dependent variables of interest are about 15 continuous psychological
scales (based upon past research that derives these scales from factor
analysis of many other individual questions), and about 4 other
continuous variables that I am interested in such as at what age
subjects think they will die at. The dependents are often significantly
correlated with each other.
The independents are expected to have small and perhaps interactive
effects on the dependents. The analysis is meant to be exploratory. I
expect little to none of my "significant" results to hold up to
corrections for multiple testing.
A MANOVA is a generalization of analyses where the simple
cases are multiple regression and 2-group or multiple-group
discriminant function. All of these are cases of "canonical
correlation". Multiple regression and 2-group DF have one
variable on the left and a number on the right, and result in
one solution -- which is a solution that people regularly post
about, to ask us how to interpret. The other canonical correlation
cases have multiple variables on both sides, and result in
multiple "canonical factors", each of which is offers the same
challenge.

Even with an N of 200, I don't like the idea of doing a multiple
regression with 15 psychological factors plus 4 others, with
hopes of learning much about those 'predictors.' The largest
correlation might have the largest loading, and that would be
encouraging about that one. But for those scales, you are
quickly analyzing artifacts of "what is left".

I would not mind plotting out my first and second canonical
roots, based on all the dichotomously-formed groups, and
some of the variables. Then I look at the loadings (not the
regression coefficients). However, I suspect that 19 might be
too many, so I would be ready to reduce that count. There
aren't 15 psychological "dimensions" if you use factor analysis
first.
Post by D.
I considered DFA or logistic regression using one of the binary
independent variables as a pseudo-dependent variable. The
pseudo-independent continuous variables would then be ranked as to
which best distinguish between the binary pseudo-dependent. I could
solve the problem of multicollinearity by doing a PCA on the continuous
variables. I decided against this method because its method of flipping
the dependent/independent relationship on its head is dubious, factors
I have trouble keeping track of what is supposed to be
"dependent" versus "independent", since (usually) the tests
come out EXACTLY the same despite the analysis. For psychological
tests, Principal Factors (rather than Components) are preferred, if
you are going to create pragmatic composites of similar scores --
that reduces the later problem of interpretation.
Post by D.
found significant in DFA would have to be deconstructed to understand
what they are saying, and further analysis would need to be done to
evaluate interactions between the binary dependents.
MANOVA seems like a good alternative. It allows interaction effects and
has a fairly straightforward interpretation. However I have some
MANOVA gives you canonical factors, with all the problems
of interpretation. If the N is too small for the problem, the
interpretation is *much* confounded by over-fitting.
Post by D.
-For my interpretation I plan to report the results for each
multivariate main effect and the univariate individual effects
regardless of if the main effect is significant. I realize a common
technique is to only proceed to the individual effects if the main
effect is significant. Is my plan acceptable in an exploratory
analysis?
-The effect of multicollinear dependents on a model is ambiguous. Some
say that correlated dependents are a serious problem
(http://www.matforsk.no/ola/ffmanova.htm), while others present a more
ambiguous case (How the Power of MANOVA Can Both Increase and Decrease
as a Function of the Intercorrelations Among the Dependent Variables.
Cole, David A.1; Maxwell, Scott E.1; Arvey, Richard2; Salas, Eduardo3,
Psychological Bulletin. Vol 115 (3), May 1994, pp. 465-474). Fooling
around with my model so far, I find that changing the number of
independents and dependents in the model changes my P values some, but
not a ton. How much should I be worrying about this assumption?
I think you should check the final chapter of Cohen's book
(1990 or so edition) to see how badly the power is affected
by too many variables on both sides of the equation.
Post by D.
Having four interacting binary independents with N=200 causes some
major stratification. I've read that no cell in the analysis should
have an N=20, or alternatively that the minimum N of the lowest cell
should not be outnumbered by the number of dependent variables. I may
lower my number of dependents to fit the latter rule if it is correct.
Is MANOVA a good option for my needs? Would I be better off doing one,
two and three way ANOVAs and the nonparametric equivalents
individually?
...

Is there a special status to the non-scale data?

I would decide whether they are supposed to be tested
after a few specific scale-scores, to see if they contribute,
or else, put them in before the scale-scores, to see if they
wipe them out. Or whatever.

Is there a *lot* that shows as univariate effects, to be sifted
through, or are you scratching to find any clue to anything?
--
Rich Ulrich, ***@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Loading...