Effect of "Prevalence" on Kappa

Suppose you are planning a reliability study and have some control over the number of cases expected to be positive and the number of cases expected to be negative.  What proportion should you use?  Convenience might suggest stongly skewing the proportion one way or the other.  The numerical example below suggests balancing the proportion. 

Kappa is a fairly simple calculation with a fairly simple correction for the chance agreement.  But there is a price to pay, and the price goes by the name of "The Kappa Paradox" (Feinstein and Cichetti) or the "Base Rate Problem." In short, unbalancing the positive agreement and the negative agreement cells leads to a decrease in kappa even for a constant percentage agreement between the two observers. 

Take a numerical example.  Two observers agree 90%.  So 45 observations are both positive (++), 45 observations are both negative (--), 5 are (+-), and 5 are (-+).  You can calculate the kappa is 0.8. Now leave both disagreement cells at 5, and systematically change the agreement cells keeping 90% agreement.  So the (++) and (--) cells go to 50 and 40, then 55 and 35, and so on. When the value of kappa is plotted for the ratio of the (++) cell to the (--) cell for a range of ratios, the results are shown in Figure 1 and Figure 2.  Figure 1 uses a linear axis and Figure 2 uses a logarithmic axis. 

Figure 1.  Effect of "Prevalence" shown with a linear scale.  A numerical example using various percentages of agreement where the ratio of both raters being positive to both raters being negative is the horizontal axis.  Note that the value of kappa is maximum at 1.0, so that equal groups of positive and negative are best.  Note also that the penalty in decreased kappa is not great for ratios from about 0.5 to 2.0. 

Figure 2.  Effect of "Prevalence" shown with a logarithmic scale.  Same data as figure 1, but the horizontal axis is the base 10 logarithm of the ratio of agreed positive to agreed negative. 

To push this idea further, what is the effect of unbalancing the (++) and (--) groups by about 2 to 1? Table 1 gives some sense of this for various percentages of agreement.

Table 1
Percentage Agreement Maximum Kappa Max Ratio (++)/(--) Min Ratio (++)/(--) Kappa
90% 0.80 2.6 .385 0.76
80% 0.60 2.2 .454 0.56
70% 0.40 2.2 .47 0.36
60% 0.20 2.0 .5 0.17

What can we make of this table? First, for every 10% drop in percent agreement, there is a 20% drop in Kappa. So train your observers well. Second, the drop in Kappa for changes in the prevalence rate of (++) to (--) is only a few percent as long as the ratio is less than about 2. So select your cases well.

Revised 09/27/03.