Discussion:
overdispersed zero-inflated and nested
(too old to reply)
Jenny Hazlehurst
2014-05-08 23:00:03 UTC
Permalink
Hey there,

I'm trying to analyze my dataset to determine the effect of treatment on nectar volume in flowers. Volumes were sampled from different flowers on the same individual at randomized times. Each individual plant received only one of the four treatments, so the volume data is nested by individual plant.

The volume data is 54% 0's, and is overdispersed.

Any ideas on analysis strategies? I'm used to R.

Thanks!

Jen
Rich Ulrich
2014-05-09 05:50:32 UTC
Permalink
On Thu, 8 May 2014 16:00:03 -0700 (PDT), Jenny Hazlehurst
Post by Jenny Hazlehurst
Hey there,
I'm trying to analyze my dataset to determine the effect of treatment on nectar volume in flowers. Volumes were sampled from different flowers on the same individual at randomized times. Each individual plant received only one of the four treatments, so the volume data is nested by individual plant.
The volume data is 54% 0's, and is overdispersed.
Any ideas on analysis strategies? I'm used to R.
I've never used the programs that accommodate 0s and overdispersion,
but I think that 54% 0s is stretching the limits of what will give
a good model, unless the data really fall out grandly that way.

So - I would certainly start by seeing what message lies in
the information, taken by itself, of zero versus non-zero.

Dropping zeroes leaves what sort of distribution for the rest?
I would not be surprised by log-normal for measuring volumes
of nectar.
--
Rich Ulrich
David Jones
2014-05-09 15:01:22 UTC
Permalink
"Rich Ulrich" wrote in message news:***@4ax.com...

On Thu, 8 May 2014 16:00:03 -0700 (PDT), Jenny Hazlehurst
Post by Jenny Hazlehurst
Hey there,
I'm trying to analyze my dataset to determine the effect of treatment on
nectar volume in flowers. Volumes were sampled from different flowers on
the same individual at randomized times. Each individual plant received
only one of the four treatments, so the volume data is nested by individual
plant.
The volume data is 54% 0's, and is overdispersed.
Any ideas on analysis strategies? I'm used to R.
I've never used the programs that accommodate 0s and overdispersion,
but I think that 54% 0s is stretching the limits of what will give
a good model, unless the data really fall out grandly that way.

So - I would certainly start by seeing what message lies in
the information, taken by itself, of zero versus non-zero.

Dropping zeroes leaves what sort of distribution for the rest?
I would not be surprised by log-normal for measuring volumes
of nectar.

Rich Ulrich

-------------------------------------------------------------

There is a class of models in which a mixed discrete-continuous family of
distributions occurs somewhat naturally ... the simplest of these are
compound Poisson models. Here a given total is made-up of contributions from
a number of events, where each event contributes a random amount (continuous
distribution) and where the number of events is also random (in simple
models this is a Poison distribution). A zero number of events gives a zero
total. The derived distribution has separate parameters relating to the
distributions of event-sizes and event-numbers, and these may help in
interpreting results. A readable description is in “Revfeim, K. J. A.
(1990), A theoretically derived distribution for annual rainfall totals.
Int. J. Climatol., 10: 647–650. doi: 10.1002/joc.3370100607 “ (not freely
accessible online). A more recent more sophisticated paper is at
http://www.kybernetika.cz/content/2011/1/15/paper.pdf (Christopher S.
Withers and Saralees Nadarajah, "ON THE COMPOUND POISSON-GAMMA
DISTRIBUTION", KYBERNETIKA — VOLUME 47 ( 2011 ) , NUMBER 1 , PAGES 15 – 37).
There may well be better sources, and I have not looked to see what is
available in R.

There remains a question relating to the OP's mention of "randomised times"
... are these "randomised times" observed and recorded and so useable in
analysis or might it be necessary to consider this as an additional source
of randomness in trying to derive a model-based distribution in an extended
version of the above compounding approach.

All of this may be too theoretical and, as Rich suggests, an initial
empirical examination of the data would be best.

David Jones
Herman Rubin
2014-05-09 16:08:38 UTC
Permalink
Post by Jenny Hazlehurst
Hey there,
I'm trying to analyze my dataset to determine the effect of treatment on
nectar volume in flowers. Volumes were sampled from different flowers
on the same individual at randomized times. Each individual plant
received only one of the four treatments, so the volume data is nested
by individual plant.
Post by Jenny Hazlehurst
The volume data is 54% 0's, and is overdispersed.
Any ideas on analysis strategies? I'm used to R.
Thanks!
Jen
I doubt that a canned program is what you need; they all
assum some form of linearity, and can only handle restricted
types of models. There may even be a need to consider the
accuracy of the measurement of the amount of nectar in a
flower, which needs to be considered in analyzing your data.

The model must come from the user, not the statistician. Also,
formulate the model without any assumption of statistical methods;
the methods may even have to be created for the problem.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
***@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Loading...