Discussion:
Sample for Determining Confidence Interval for Sensitivity
norky
2008-12-05 14:20:08 UTC
Hi All:

I am calculating sensitivity and specificity as well as PPV and NPV of
a diagnotic tool. For the purposes of this study a sensitivity of
85%-100% will be considered a good predictor. I would also like to
calculate a 95% confidence interval. How do I determine the
appropriate sample size? I am getting conflicting answers from
others. Any help would be appreciated.

Thanks!
John Uebersax
2008-12-05 15:01:12 UTC
My suggestion would be to use the same methods one would use to
estimate confidence intervals (and/or statistical power) for a simple
proportion. In a sense, sensitivity is just a proportion (similarly
for Sp, PPV, and NPV).

There are several different methods for estimating the CI of a
proportion; since these diagnostic indices often approach or reach
1.0, methods based on the normal approximation to the binomial are
less preferred (since they can result in CIs that exceed 1.0).

HTH

John Uebersax PhD
Post by norky
I am calculating sensitivity and specificity as well as PPV and NPV of
a diagnotic tool.  For the purposes of this study a sensitivity of
85%-100% will be considered a good predictor.  I would also like to
calculate a 95% confidence interval.  How do I determine the
appropriate sample size?  I am getting conflicting answers from
others.  Any help would be appreciated.
Thanks!
Bruce Weaver
2008-12-05 15:37:04 UTC
Post by norky
I am calculating sensitivity and specificity as well as PPV and NPV of
a diagnotic tool.  For the purposes of this study a sensitivity of
85%-100% will be considered a good predictor.  I would also like to
calculate a 95% confidence interval.  How do I determine the
appropriate sample size?  I am getting conflicting answers from
others.  Any help would be appreciated.
Thanks!
You say you are *calculating* sensitivity, specificity, PPV and NPV.
Presumably then, you have a 2x2 table. For sensitivity, the sample
size is the the first column sum; for specificity, it is the second
column sum; for PPV and NPV, it is the first and second row sums
respectively. Or am I missing something?

--
Bruce Weaver
"When all else fails, RTFM."
norky
2008-12-05 16:35:55 UTC
Post by Bruce Weaver
Post by norky
I am calculating sensitivity and specificity as well as PPV and NPV of
a diagnotic tool.  For the purposes of this study a sensitivity of
85%-100% will be considered a good predictor.  I would also like to
calculate a 95% confidence interval.  How do I determine the
appropriate sample size?  I am getting conflicting answers from
others.  Any help would be appreciated.
Thanks!
You say you are *calculating* sensitivity, specificity, PPV and NPV.
Presumably then, you have a 2x2 table.  For sensitivity, the sample
size is the the first column sum; for specificity, it is the second
column sum; for PPV and NPV, it is the first and second row sums
respectively.  Or am I missing something?
--
Bruce Weaver
"When all else fails, RTFM."

The issue that I think you are addressing, Bruce, is why I need a
sample size calculation. Yes, simple 2x2 table with new test on the
rows and gold standard on the top. I was told by some of my
collaborators that our estimates of sensitivity, specificity, etc.
should have a given precision, i.e., if I get a sensivity of 90%, I
would want a precise 95% confidence interval. I have a SAS macro that
will provide the CI for diagnositic test characterisitcs. However, I
would like the CI of these estimates to be narrow. I was told two
different ways in which to calculate the sample size necessary to
estimate the CI with the desired precision. One of which was using
the formula for a simple proportion, which was also suggested in one
of the responses here, and the other was much more complicated and
involved using the arcsin method. Thoughts?
John Uebersax
2008-12-05 17:37:57 UTC
It seems like there's a possible misunderstanding and Bruce was maybe
referring literally to your actual sample size -- i.e., row/column
marginal totals relevant to calculation of Se, Sp, PPV and PPN.

I've never heard of using an arcsin transformation here, but I suppose
it's possible.

http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

The The Agresti & Coull (1998) paper is a classic. The Clopper-
Pearson method is a well-known standard -- not necessarilly the best
by modern standards, most likely sufficient for your needs.

To estimate the required sample size, you could try the normal
approximation to the binomial first:

1. Decide what you expect Se to be (e.g., .90).

2. Decide how accurate you want the results to be; e.g., 95%
confidence that you estimate the true population value within +/- .05
-- so, roughly, that means you want the standard error to be round .
025 (or more accurately .05/1.96).

3. Use the normal approximation formula for the standard error of a
proportion to solve for N.

This should be okay unless your expected Se is very high (e.g., .98)
and/or your target confidence range is wide (e.g., +/- .10). In that
case you've got the issue that the sampling distribution around the
point estimat is not symmetrical. At that point you start working
with cumulative beta (and cumulative beta inverse) distributions --
per the web page cited above.

John Uebersax PhD
RichUlrich
2008-12-05 19:10:22 UTC
On Fri, 5 Dec 2008 08:35:55 -0800 (PST), norky
Post by norky
Post by Bruce Weaver
Post by norky
I am calculating sensitivity and specificity as well as PPV and NPV of
a diagnotic tool.  For the purposes of this study a sensitivity of
85%-100% will be considered a good predictor.  I would also like to
calculate a 95% confidence interval.  How do I determine the
appropriate sample size?  I am getting conflicting answers from
others.  Any help would be appreciated.
Thanks!
You say you are *calculating* sensitivity, specificity, PPV and NPV.
Presumably then, you have a 2x2 table.  For sensitivity, the sample
size is the the first column sum; for specificity, it is the second
column sum; for PPV and NPV, it is the first and second row sums
respectively.  Or am I missing something?
--
Bruce Weaver
"When all else fails, RTFM."
The issue that I think you are addressing, Bruce, is why I need a
sample size calculation. Yes, simple 2x2 table with new test on the
rows and gold standard on the top. I was told by some of my
collaborators that our estimates of sensitivity, specificity, etc.
should have a given precision, i.e., if I get a sensivity of 90%, I
would want a precise 95% confidence interval. I have a SAS macro that
will provide the CI for diagnositic test characterisitcs. However, I
would like the CI of these estimates to be narrow. I was told two
different ways in which to calculate the sample size necessary to
estimate the CI with the desired precision. One of which was using
the formula for a simple proportion, which was also suggested in one
of the responses here, and the other was much more complicated and
involved using the arcsin method. Thoughts?
When you say, "determine the appropriate sample size",
you seem to cast this into the paradigm of doing a power
analysis. For the sake of a power analysis, it is usually
advisable to have the whole proposed table -- that is,
for instance, specificity as well as sensitivity, and something

However, if you want to put a narrow CI on the effect size,
the way to do that is to extrapolate from hypothesized
effects, and escalate the Ns in order to achieve the narrowness
of the CI that you need. Is specificity a good measure of effect

I have a little difficulty arising from the fact that specificity,
by itself, is not at all assured to be a good measure of "effect"
for a 2x2 table, when compared to the Odds Ratio or even to
the phi coefficient.

Another thing that complicates this, for your example, is that
there are several ways to estimate the CI for extreme proportions.
The arcsine transformation is not very good outside of (10%, 90%),
but it is better than the additive formula (the one that gives
symmetric limits around the point estimate).

I'll suggest that you read up on Power analysis in Cohen, to
see if "sensitivity" is really what you want to look at. His
chapter on proportions does use the arcsine transformation,
if you want to go by that. Looking only for "significant" effects,
I've always adapted the 2x2 table from the chapter on rxk tables,
since the latter was more accurate for my purposes.
--
Rich Ulrich
Bruce Weaver
2008-12-05 21:51:09 UTC
Post by norky
Post by Bruce Weaver
Post by norky
I am calculating sensitivity and specificity as well as PPV and NPV of
a diagnotic tool.  For the purposes of this study a sensitivity of
85%-100% will be considered a good predictor.  I would also like to
calculate a 95% confidence interval.  How do I determine the
appropriate sample size?  I am getting conflicting answers from
others.  Any help would be appreciated.
Thanks!
You say you are *calculating* sensitivity, specificity, PPV and NPV.
Presumably then, you have a 2x2 table.  For sensitivity, the sample
size is the the first column sum; for specificity, it is the second
column sum; for PPV and NPV, it is the first and second row sums
respectively.  Or am I missing something?
--
Bruce Weaver
"When all else fails, RTFM."
The issue that I think you are addressing, Bruce, is why I need a
sample size calculation.  Yes, simple 2x2 table with new test on the
rows and gold standard on the top.  I was told by some of my
collaborators that our estimates of sensitivity, specificity, etc.
should have a given precision, i.e., if I get a sensivity of 90%, I
would want a precise 95% confidence interval.  I have a SAS macro that
will provide the CI for diagnositic test characterisitcs.  However, I
would like the CI of these estimates to be narrow.  I was told two
different ways in which to calculate the sample size necessary to
estimate the CI with the desired precision.  One of which was using
the formula for a simple proportion, which was also suggested in one
of the responses here, and the other was much more complicated and
involved using the arcsin method.  Thoughts?
Which of those methods does your SAS macro compute? Or does it
compute both? You might want to look at Robert Newcombe's website--it
has some good info on confidence intervals for proportions. Mind the
line-wrap:

http://www.cardiff.ac.uk/medic/contactsandpeople/n/newcombe-robert-gordon-prof-overview_new.html

Regarding how to determine the sample size, why not just plug in a few
different values of N, and narrow it down until you get a CI that is
tight enough? It's not very elegant, but it should get you an answer
fairly quickly.

--
Bruce Weaver