1. Determining minimum sample size required to achieve a specified precision in a sample mean.

Sample means estimate population means and sample statistics such as the standard error and confidence intervals can be used to measure the precision of the estimate. However, these are post hoc (after the event) measures of precision. Suppose you wish to determine, a priori (priot to starting), the sample size needed to reach a specified precision, what method is used?

The process is iterative, in that an estimate of the population standard deviation is needed. This can be obtained from a preliminary sample. The subsequent technique depends on the size of the preliminary sample. If the sample was 'large' (typically >30) a z statistic is used, if the sample was 'small' a t statistic is used.

Example 1
The effect of aircraft noise on the delay in getting to sleep was tested on 15 people and a mean of time from light out to sleep of 18.5 minutes, with a standard deviation of 2.5 minutes, was found. If we wish to estimate the population mean to within 1 minute what sample size is needed?

The equation needed to determine n is n = ( t(alpha,2).s/precision)2
for 14 degrees of freedom, and 95% confidence interval, t(0.05,2) is 2.145. Therefore n = (2.145 . 2.5/1.0)2 = 28.8 or, after rounding, 29.
If we had obtained a large sample, for example a mean of 18.25 (s = 2.1) from a sample of 100 we would use: n = (z(alpha,2).s/precision)
Using a 95% confidence interval, z(0.05,2) is 1.96. Therefore: n = (1.96 . 2.1 / 1)2 or n = 16.9 or, after rounding, 17.

This result tells us that the sample size used was far too large for our desired level of precision. The smaller sample size in the second example is due to two factors:

  • the standard deviation was smaller (using s = 2.5 gives a sample size of 24)
  • greater confidence in our estimate of the population standard deviation, this is reflected by our use of a z statistic in place of the t statistic.

Note that we can use this method to achieve a precision defined in percentage terms, e.g. to estimate the population mean with a precision of ± 10% the required precision would be 1.85 minutes (based on our sample mean estimate).

The following two examples are based on Campbell et al (1995).
2. Two samples of continuous data (e.g. an unpaired t-test).

You must first decide on a:

  • minimum sample difference that is biologically meaningful, this is the effect size or mdd.
  • significance level (alpha)
  • power value (beta)

You also require an estimate of the variability of your observations in the form of a standard deviation (s).

where d = mdd/s and m is the minimum sample size. The z values are from tables. Some common values are:

alpha = 0.025 (0.05 2-tailed)

1.96

alpha = 0.005 (0.01 2 -tailed)

2.58

beta = 0.1 (power = 0.9)

1.28

beta = 0.2 (power = 0.8)

0.84

example calculations

The fist example is based on the previous example. What is m for a mdd of 10 when s = 3.87 (which is equivalent to a variance of 15). Thus, d = 10/3.87 = 2.58


= 4.11 or, after rounding up, 5.

The sample size of 5 is in close agreement with the results generated by Power Plant.

What is m for a mdd of 3 when s = 2. Thus d = 3/2 = 1.5



=
10.29 or, rounding up, 11.

Note that the above calculations can be simplified. For alpha = 0.5 and beta = 0.1 the equation is approximately: (21 / d) + 1

3. The difference between two proportions

Campbell et al (1995) provide an approximate formula to determine the sample size required to detect a specified difference in proportions.

where d is pA - pB

Suppose that you know the proportion of patients experiencing a particular infection is 0.2. You have a new treatment that you think may decrease this proportion. You set the effect size at 0.05, i.e. a 25% reduction to 0.15. Thus, d = 0.2 - 0.15 = 0.05. What sample size is needed to detect such a reduction?

Assume, that as previously, alpha is 0.05 and beta is 0.1.

= approximately 1200!

If we wish to detect a 50% reduction from 0.2 to 0.1 the required sample size is:

= approximately 260.

4. Detecting temporal trends

Detecting temporal trends is an important goal for many studies. For example, identifying declining populations in endangered species; identifying increases in disease incidence. The problem is one of picking the 'signal' out of the 'noise' caused by seasonal variation and stochastic variation. Determining the sampling effort required to identify trends is complex because there are many parameters that can be controlled. For example:

  • How many plots should be monitored?
  • The magnitude of the counts per plot.
  • The amount of stochastic variation.
  • The length of the monitoring period.
  • The interval between counts (months, annual, biennial).
  • The magnitude of the trend.
  • The significance level.

Because of this complexity it is very difficult to provide simple rules for the estimation of power. Fortunately there is some public domain software available. The following example is taken from the Monitor user manual.

The Dachigam Wildlife Sanctuary, Kashmir, India has a population of Himalayan black bears (Selenarctos thibetanus) . Unfortunately little is known about the population's status or trends. Throughout most of the year the bears are scattered throughout the sanctuary and are very difficult to count. However, during the peak fruiting period for local mast-bearing trees, most bears in the sanctuary travel to a large, central grove of masting trees to forage where it is possible to get repeatable counts of the number of bears traveling to and from the grove on any given day.

Baseline data were obtained from 15 separate, day-long counts of the bears. The average was 15.6 bears per day with a standard deviation of 3.6 bears. Would monitoring by one park ranger on 3 separate days, over a 10 year period, be sufficient to detect annual linear trends (positive and negative) of at least 3% in the bear population with a power > 0.90?

The results from a range of simulated conditions using the Monitor software are presented below.

Power to detect trends in a Himalayan black bear population surveyed annually over a 10 year period in Dachigam Wildlife Sanctuary, Kashmir, India. These data were provided by Vasant K. Saberwal.

Number of counts/year

Trend (%)

3

5

10

-10 0.99 1.00 1.00
-5 0.74 0.94 1.00
-3 0.42 0.61 0.88
-2 0.22 0.29 0.62
-1 0.12 0.14 0.21
0 0.045 0.046 0.046
+1 0.078 0.15 0.28
+2 0.29 0.48 0.77
+3 0.65 0.87 0.99
+5 0.98 1.00 1.00
+10 1.00 1.00 1.00

These results demonstrate that 3 counts is insufficient to provide sufficient power to detect a 3% trend. Note that the power differs between increasing and decreasing trends. Increasing the counts to 5 results in sufficient power to detect a 3% increase but 10 counts are needed for a 3% decline.

5. Power analyses for regression lines

Trends is a free piece of software (DOS) that canbe used to analyse the power of regression lines. Although originally written to monitor changes in population sizes it can be easilly applied to any other situation.