Discussion:
Comparing group differences when data is pareto distributed
(too old to reply)
Bruce Weaver
2014-03-28 22:54:07 UTC
Permalink
Hello,
A recent article by O'Boyle & Aguinis (2012) demonstrated that most individual job performance follows a Pareto rather than a normal distribution. This is a VERY big deal, because most statistical techniques used by psychology and management researchers assume normally distributed data. If now tools like ANOVA, regression, and any variations on the general linear model cannot be used to correctly analyze job performance data, what alternatives (other than chi-square) are available in SPSS? What tools can be used to, for example, make between-group difference comparisons, when your data is Pareto rather than normally distributed? Any guidance would be appreciated.
Thank you!
Your question is not really about SPSS, so is probably more appropriate
for a group like sci.stat.consult (where I have cross-posted this reply).

I assume you are referring to this article:

http://hrprofessionalsmagazine.com/wp-content/uploads/2013/03/Normality-of-Performance-Paretian-Theory.pdf

I don't have time to read it right now, but here are a few quick comments.

1. As George Box observed, nothing in nature is normally distributed.
The normal distribution is an approximation that can serve as a useful
model. (He also said there's no such thing as a straight line in
nature, but that linear fits can provide useful models.)

2. Methods that assume normality typically assume normality of the
(model fitting) errors, or normality of the sampling distribution of
some statistic, not normality of the raw data.

3. As Herman Rubin and others have said many times, normality of the
errors is far less important than the independence of the errors and
homoscedasticity (i.e., the "identically distributed" part of
"independently and identically distributed").

HTH.
--
Bruce Weaver
***@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."
Rich Ulrich
2014-03-29 01:57:58 UTC
Permalink
On Fri, 28 Mar 2014 18:54:07 -0400, Bruce Weaver
Post by Bruce Weaver
Hello,
A recent article by O'Boyle & Aguinis (2012) demonstrated that most individual job performance follows a Pareto rather than a normal distribution. This is a VERY big deal, because most statistical techniques used by psychology and management researchers assume normally distributed data. If now tools like ANOVA, regression, and any variations on the general linear model cannot be used to correctly analyze job performance data, what alternatives (other than chi-square) are available in SPSS? What tools can be used to, for example, make between-group difference comparisons, when your data is Pareto rather than normally distributed? Any guidance would be appreciated.
Thank you!
Your question is not really about SPSS, so is probably more appropriate
for a group like sci.stat.consult (where I have cross-posted this reply).
http://hrprofessionalsmagazine.com/wp-content/uploads/2013/03/Normality-of-Performance-Paretian-Theory.pdf
"our results indicate that individual job performance
follows a Paretian distribution."

That is a bit too broad. Perhaps it might be more fair
to say, "The usual measures of ... performance. "
Clearly, if some measure is log-normal, then you get a
*normal* distribution when you decide to analyze the log of it.

The authors say, "power distribution" as Pareto. That brings
us back to the Box-Cox family of transformations -- reciprocal,
log, square root, square.

- The authors do cite anecdotes of data where they
*theorize* that a few outcomes should be highly skewed -
with little or no evidence or discussion about how one
should regard the error at the extreme, or the prospect
of linearity with possible predictors. To my quick read, they
raise some questions for some particular studies without
actually suggesting or justifying any particular solution.

I'm all in favor of folks being more careful than they
have been, in selecting how they measure whatever
they measure. I have had the experience of arguing,
to no good effect at all,
"Look how good, how bivariate normal, this plot is, where
I have used the reciprocal of the Pre-Post measures ...
our ANOVAs will mostly be about the same, but with 20%
more power." -- "We will go ahead and analyze it like
everybody else does."
Post by Bruce Weaver
I don't have time to read it right now, but here are a few quick comments.
1. As George Box observed, nothing in nature is normally distributed.
The normal distribution is an approximation that can serve as a useful
model. (He also said there's no such thing as a straight line in
nature, but that linear fits can provide useful models.)
2. Methods that assume normality typically assume normality of the
(model fitting) errors, or normality of the sampling distribution of
some statistic, not normality of the raw data.
3. As Herman Rubin and others have said many times, normality of the
errors is far less important than the independence of the errors and
homoscedasticity (i.e., the "identically distributed" part of
"independently and identically distributed").
--
Rich Ulrich
Loading...