Poissn vs Logit models

Discussion:

(too old to reply)

f***@gmail.com

2015-04-11 23:54:53 UTC

Hi Everyone,

I have one doubt, that maybe you can help.

In non-life pricing the frequency regression is based on Poisson regression. But because the number of claims is not the same in every tariff cells usually one uses an offset approach or a weighting procedure. Check help("Insurance", package = "MASS") in R for an example.

But why not use a Logistic Regression?

#the standard procedure is this:
data(Insurance, package="MASS")
glm(Claims ~ District + Group + Age + offset(log(Holders)), data = Insurance, family = poisson)

#but why not?
glm(Claims/Holders ~ District + Group + Age ), data = Insurance, link(logit), family(binomial))

Rich Ulrich

2015-05-11 16:18:15 UTC

Permalink

Post by f***@gmail.com
Hi Everyone,
I have one doubt, that maybe you can help.
In non-life pricing the frequency regression is based on Poisson regression. But because the number of claims is not the same in every tariff cells usually one uses an offset approach or a weighting procedure. Check help("Insurance", package = "MASS") in R for an example.
But why not use a Logistic Regression?
data(Insurance, package="MASS")
glm(Claims ~ District + Group + Age + offset(log(Holders)), data = Insurance, family = poisson)
#but why not?
glm(Claims/Holders ~ District + Group + Age ), data = Insurance, link(logit), family(binomial))

Nobody has offered an answer in a month, so here is
my shot -- despite the fact that I don't know, even,
what "non-life pricing" is. Here are a couple of notions
about why Poisson is the link, not logistic.

First suggestion: The logit transformation is used for models
that are bounded and symmetric, between 0 and 100%.
I don't see how "Claims/Holders" would fit that model
unless there is something hidden in the definitions.
Perhaps a log link is what you had in mind?

Second suggestion: Any regression is evaluated by the
reduction of the error term, and the purpose of a Poisson
link is to accommodate errors that are presumed to be
Poisson: Poisson is often natural for counts where events
are generated independently. I don't see why there would
be a logistic distribution for a term, "Claims/Holders"; or any
simple distribution for that, at all. Since I don't know what
those are pointing to in the real world, I could be missing it.

--
Rich Ulrich

Herman Rubin

2015-05-13 16:11:42 UTC

Permalink

Post by Rich Ulrich

Post by f***@gmail.com
Hi Everyone,
I have one doubt, that maybe you can help.
In non-life pricing the frequency regression is based on Poisson

regression. But because the number of claims is not the same in
every tariff cells usually one uses an offset approach or a weighting
procedure. Check help("Insurance", package = "MASS") in R for an example.

Post by Rich Ulrich

Post by f***@gmail.com
But why not use a Logistic Regression?
#the standard procedure is this: >>data(Insurance, package="MASS")
glm(Claims ~ District + Group + Age + offset(log(Holders)), data =

Insurance, family = poisson)

Post by Rich Ulrich

Post by f***@gmail.com
#but why not? >>glm(Claims/Holders ~ District + Group + Age ), data =

Insurance, link(logit), family(binomial))

Post by Rich Ulrich
Nobody has offered an answer in a month, so here is
my shot -- despite the fact that I don't know, even,
what "non-life pricing" is. Here are a couple of notions
about why Poisson is the link, not logistic.
First suggestion: The logit transformation is used for models
that are bounded and symmetric, between 0 and 100%.
I don't see how "Claims/Holders" would fit that model
unless there is something hidden in the definitions.
Perhaps a log link is what you had in mind?
Second suggestion: Any regression is evaluated by the
reduction of the error term, and the purpose of a Poisson
link is to accommodate errors that are presumed to be
Poisson: Poisson is often natural for counts where events
are generated independently. I don't see why there would
be a logistic distribution for a term, "Claims/Holders"; or any
simple distribution for that, at all. Since I don't know what
those are pointing to in the real world, I could be missing it.

The above might be reasonable, but what to do is unclear. The
assumptions about the probability model need to be made by the
user NOT CONSIDERING WHAT PROGRAMS ARE AVAILABLE, and then
statistical advice can be given, possibly even requiring a
procedure not in the package which is being used, or any other
package. Statisticsa is not merely a collection of procedures
from which to choose.

--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
***@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558